# Introduction to SageMaker Built-In Algorithms - Text Generation

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy Open-LLAMA model for text generation. It is a permissively licensed (Apache-2.0) open source reproduction of [Meta AI’s LLaMA 7B](https://huggingface.co/decapoda-research/llama-7b-hf) trained on the [RedPajama dataset](https://www.together.xyz/blog/redpajama) which is a reproduction of the LLaMA training dataset containing over 1.2 trillion tokens.

---

In [None]:
!pip uninstall -y sagemaker --quiet
!pip install sagemaker --quiet

In [None]:
model_id, model_version, = (
 "huggingface-textgeneration-open-llama",
 "*",
)

In [None]:
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.serializers import JSONSerializer

my_model = JumpStartModel(model_id=model_id)
predictor = my_model.deploy()

predictor.serializer = JSONSerializer()
predictor.content_type = "application/json"

payload = {
 "text_inputs": "Building a website can be done in 10 simple steps:",
 "max_length": 110,
 "no_repeat_ngram_size": 3,
}
response = predictor.predict(payload)
print(response["generated_texts"][0])

In [None]:
# Sentiment-analysis
payload = {
 "text_inputs": """"I hate it when my phone battery dies."
 Sentiment: Negative
 ###
 Tweet: "My day has been :+1:"
 Sentiment: Positive
 ###
 Tweet: "This is the link to the article"
 Sentiment: Neutral
 ###
 Tweet: "This new music video was incredibile"
 Sentiment:"""
}
response = predictor.predict(payload)
print(response["generated_texts"][0])

In [None]:
# Question answering
payload = {
 "text_inputs": "Could you remind me when was the C programming language invented?",
 "max_length": 34,
}
response = predictor.predict(payload)
print(response["generated_texts"][0])

In [None]:
# Recipe generation
payload = {"text_inputs": "What is the recipe for a delicious lemon cheesecake?", "max_length": 70}
response = predictor.predict(payload)
print(response["generated_texts"][0])

### Supported Parameters

***
This model supports many parameters while performing inference. They include:

* **max_length:** Model generates text until the output length (which includes the input context length) reaches `max_length`. If specified, it must be a positive integer.
* **num_return_sequences:** Number of output sequences returned. If specified, it must be a positive integer.
* **num_beams:** Number of beams used in the greedy search. If specified, it must be integer greater than or equal to `num_return_sequences`.
* **no_repeat_ngram_size:** Model ensures that a sequence of words of `no_repeat_ngram_size` is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **early_stopping:** If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
* **do_sample:** If True, sample the next word as per the likelihood. If specified, it must be boolean.
* **top_k:** In each step of text generation, sample from only the `top_k` most likely words. If specified, it must be a positive integer.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.
* **seed:** Fix the randomized state for reproducibility. If specified, it must be an integer.
* **return_full_text:** If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.

We may specify any subset of the parameters mentioned above while invoking an endpoint.

***

### Clean up the endpoint

In [None]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()