# Serve gpt-j-6B on SageMaker with DJLServing using PySDK

---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

---

### Update pip package to the latest version

In [None]:
%%bash
pip install -U pip --quiet
pip install -U sagemaker --quiet
pip install -U boto3 --quiet

pip install -U transformers --quiet

### Configure instance type, S3 bucket etc

In [None]:
import sagemaker
from sagemaker.s3 import S3Uploader
from transformers import AutoModel, AutoTokenizer

# Replace with your own settings
instance_type = "ml.g5.12xlarge"

role = sagemaker.get_execution_role()  # execution role for the endpoint
session = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
region = session._region_name

### Download model from Hugging Face

Downloading model from Hugging Face hub is time-consuming, it will slow down SageMaker host startup.
We recommend you download the model and upload uncompressed artifacts to S3 bucket to speed up SageMaker startup.

In [None]:
# model_id = "EleutherAI/gpt-j-6B"

# model = AutoModel.from_pretrained(model_id)
# model.save_pretrained("gpt-j-6B")

# tokenizer = AutoTokenizer.from_pretrained(model_id)
# tokenizer.save_pretrained("gpt-j-6B")

# bucket = session.default_bucket()      # bucket to house artifacts
# s3_location = f"s3://{bucket}/djl-serving/gpt-j-6B"
# S3Uploader.upload("gpt-j-6B", s3_location)

For demo purpose, we use gpt-j-6b-model artifacts from our S3 bucket

In [None]:
pretrained_model_location = f"s3://sagemaker-example-files-prod-{region}/models/gpt-j-6b-model/"
print(f"Pretrained model will be downloaded from ---- > {pretrained_model_location}")

### Deploy the model to SageMaker

In [None]:
from sagemaker.djl_inference import DJLModel

model = DJLModel(
    pretrained_model_location,
    role,
    task="text-generation",
    number_of_partitions=2,
    data_type="fp16",
)

predictor = model.deploy(initial_instance_count=1, instance_type=instance_type)

### Run inference using your endpoint

In [None]:
data = {
    "inputs": [
        "The ability to spread butter on toast is",
        "Video games are truly the",
    ],
    "parameters": {
        "max_length": 200,
        "temperature": 0.1,
    },
}
outputs = predictor.predict(data)
for output in outputs:
    print(output["generated_text"])

### Clean up your resource after testing

In [None]:
# Delete SageMaker endpoint and model
predictor.delete_endpoint()
model.delete_model()

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/inference|generativeai|deepspeed|GPT-J-6B_DJLServing_with_PySDK.ipynb)
