# Bedrock boto3 Setup

--- 

In this demo notebook, we demonstrate how to use the `boto3` Python SDK to work with [Bedrock](https://aws.amazon.com/bedrock/) Foundational Models.

---

## Prerequisites

---
Before executing any of the notebook in this workshop, execute the following cells to add Bedrock extensions to the `boto3` Python SDK

---

In [None]:
# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell
%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall


You also need to install [langchain](https://github.com/hwchase17/langchain)

In [None]:
%pip install langchain==0.0.190 --quiet

## Create the boto3 client

Interaction with the Bedrock API is done via boto3 SDK. To create a the Bedrock client, we are providing an utility method that supports different options for passing credentials to boto3. 
If you are running these notebooks from your own computer, make sure you have [installed the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) before proceeding.


#### Use default credential chain

If you are running this notebook from a Sagemaker Studio notebook and your Sagemaker Studio role has permissions to access Bedrock you can just run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default credentials have access to Bedrock

#### Use a different role

In case you or your company has setup a specific role to access Bedrock, you can specify such role by uncommenting the line `#os.environ['BEDROCK_ASSUME_ROLE'] = ''` in the cell below before executing it. Ensure that your current user or role have permissions to assume such role.

#### Use a specific profile

In case you are running this notebooks from your own computer and you have setup the AWS CLI with multiple profiles and the profile which has access to Bedrock is not the default one, you can uncomment the line `#os.environ['AWS_PROFILE'] = ''` and specify the profile to use.

#### Note about `langchain`

The Bedrock classes provided by `langchain` create a default Bedrock boto3 client. We recommend to explicitly create the Bedrock client using the instructions below, and pass it to the class instantiation methods using `client=boto3_bedrock`

In [None]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access


#import os
#os.environ['BEDROCK_ASSUME_ROLE'] = ''
#os.environ['AWS_PROFILE'] = ''

In [None]:
import os
import sys
import json

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
if('BEDROCK_ASSUME_ROLE' in os.environ):
 boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))
else:
 boto3_bedrock = bedrock.get_bedrock_client()


#### We can validate our connection by testing out the `list_foundation_models()` method, which will tell us all the models available for us to use 

In [None]:
boto3_bedrock.list_foundation_models()

#### In this Notebook we will be using the `invoke_model()` method of Amazon Bedrock. This will be the primary method we use for most of our Text Generation and Processing tasks. 

# `InvokeModel` body and output

#### We provide the details about the format for the input and output format of `invoke_model()` for the different foundation models

## Titan Large

#### Input
```json
{ 
 "inputText": "",
 "textGenerationConfig" : { 
 "maxTokenCount": 512,
 "stopSequences": [],
 "temperature":0.1, 
 "topP":0.9
 }
}
```

#### Output

```json
{
 "inputTextTokenCount": 613,
 "results": [{
 "tokenCount": 219,
 "outputText": ""
 }]
}
```

## Jurassic Grande and Jumbo 

#### Input

```json
{
 "prompt": "",
 "maxTokens": 200,
 "temperature": 0.5,
 "topP": 0.5,
 "stopSequences": [],
 "countPenalty": {
 "scale": 0
 },
 "presencePenalty": {
 "scale": 0
 },
 "frequencyPenalty": {
 "scale": 0
 }
}
```

#### Output

```json
{
 "id": 1234,
 "prompt": {
 "text": "",
 "tokens": [
 {
 "generatedToken": {
 "token": "\u2581who\u2581is",
 "logprob": -12.980147361755371,
 "raw_logprob": -12.980147361755371
 },
 "topTokens": null,
 "textRange": {
 "start": 0,
 "end": 6
 }
 },
 ...
 ]
 },
 "completions": [
 {
 "data": {
 "text": "",
 "tokens": [
 {
 "generatedToken": {
 "token": "<|newline|>",
 "logprob": 0.0,
 "raw_logprob": -0.01293118204921484
 },
 "topTokens": null,
 "textRange": {
 "start": 0,
 "end": 1
 }
 },
 ...
 ]
 },
 "finishReason": {
 "reason": "endoftext"
 }
 }
 ]
}
```

## Claude

#### Input

```json
{
 "prompt": "\n\nHuman:\n\nAnswer:",
 "max_tokens_to_sample": 300,
 "temperature": 0.5,
 "top_k": 250,
 "top_p": 1,
 "stop_sequences": [
 "\n\nHuman:"
 ]
}
```

#### Output

```json
{
 "completion": " ",
 "stop_reason": "stop_sequence"
}
```

## Stable Diffusion XL

### Input

```json
{
 "text_prompts": [
 { 
 "text": "this is where you place your input text" 
 }
 ],
 "cfg_scale":10,
 "seed":0,
 "steps":50
}
```

### Output

```json
{ 
 "result": "success", 
 "artifacts": [
 {
 "seed": 123, 
 "base64": "",
 "finishReason": "SUCCESS"
 }
}
```

# Common inference parameter definitions

## Randomness and Diversity

Foundation models support the following parameters to control randomness and diversity in the 
response.

**Temperature** – Large language models use probability to construct the words in a sequence. For any 
given next word, there is a probability distribution of options for the next word in the sequence. When 
you set the temperature closer to zero, the model tends to select the higher-probability words. When 
you set the temperature further away from zero, the model may select a lower-probability word.

In technical terms, the temperature modulates the probability density function for the next tokens, 
implementing the temperature sampling technique. This parameter can deepen or flatten the density 
function curve. A lower value results in a steeper curve with more deterministic responses, and a higher 
value results in a flatter curve with more random responses.

**Top K** – Temperature defines the probability distribution of potential words, and Top K defines the cut 
off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the 
most probable words that could be next in a given sequence. This reduces the probability that an unusual 
word gets selected next in a sequence.
In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top-
K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest-
probability tokens.

**Top P** – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top 
P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is 
similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their 
probabilities.
For the example prompt "I hear the hoof beats of ," you may want the model to provide "horses," 
"zebras" or "unicorns" as the next word. If you set the temperature to its maximum, without capping 
Top K or Top P, you increase the probability of getting unusual results such as "unicorns." If you set the 
temperature to 0, you increase the probability of "horses." If you set a high temperature and set Top K or 
Top P to the maximum, you increase the probability of "horses" or "zebras," and decrease the probability 
of "unicorns."

## Length

The following parameters control the length of the generated response.

**Response length** – Configures the minimum and maximum number of tokens to use in the generated 
response.

**Length penalty** – Length penalty optimizes the model to be more concise in its output by penalizing 
longer responses. Length penalty differs from response length as the response length is a hard cut off for 
the minimum or maximum response length.

In technical terms, the length penalty penalizes the model exponentially for lengthy responses. 0.0 
means no penalty. Set a value less than 0.0 for the model to generate longer sequences, or set a value 
greater than 0.0 for the model to produce shorter sequences.

## Repetitions

The following parameters help control repetition in the generated response.

**Repetition penalty (presence penalty)** – Prevents repetitions of the same words (tokens) in responses. 
1.0 means no penalty. Greater than 1.0 decreases repetition.

In [None]:
prompt_data = """Command: Write me a blog about making strong business decisions as a leader.\nBlog:"""

## 2. Accessing Bedrock Foundation Models

### Let's try the prompt with the Titan Model on Bedrock

In [None]:
prompt_data = """Command: Write me a blog about making strong business decisions as a leader.\nBlog:""" # If you'd like to try your own prompt, edit this parameter!

In [None]:
body = json.dumps({"inputText": prompt_data})
modelId = "amazon.titan-tg1-large" 
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
 body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body.get("results")[0].get("outputText"))

### Let's try the prompt with the Anthropic Claude Instant Model on Bedrock

In [None]:
body = json.dumps({"prompt": prompt_data, "max_tokens_to_sample": 500})
modelId = "anthropic.claude-instant-v1" # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
 body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completion"))

### Let's try the prompt with the Jurassic Grande Model on Bedrock

In [None]:
body = json.dumps({"prompt": prompt_data, "maxTokens": 200})
modelId = "ai21.j2-grande-instruct" # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
 body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body.get("completions")[0].get("data").get("text"))

### Let's try the streaming output from Bedrock

In [None]:
from IPython.display import display, display_markdown, Markdown, clear_output

body = json.dumps({"prompt": prompt_data, "max_tokens_to_sample": 200})
modelId = "anthropic.claude-instant-v1" # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model_with_response_stream(body=body, modelId=modelId, accept=accept, contentType=contentType)
stream = response.get('body')
output = []

if stream:
 for event in stream:
 chunk = event.get('chunk')
 if chunk:
 chunk_obj = json.loads(chunk.get('bytes').decode())
 text = chunk_obj['completion']
 clear_output(wait=True)
 output.append(text)
 display_markdown(Markdown(''.join(output)))
 

### Let's try the prompt with the Stable Diffusion XL on Bedrock

In [None]:
prompt_data = "a fine image of an astronaut riding a horse on Mars"
body = json.dumps({
 "text_prompts": [
 { 
 "text": prompt_data 
 }
 ],
 "cfg_scale":10,
 "seed":20,
 "steps":50
})
modelId = "stability.stable-diffusion-xl" 
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
 body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

print(response_body['result'])
print(f'{response_body.get("artifacts")[0].get("base64")[0:80]}...')

The output is a base64 encoded string of the image. You can use ans image processing library such as Pillow to decode the image as in the example below:

```python
base_64_img_str = response_body.get("artifacts")[0].get("base64")
image = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, "utf-8"))))
```

# Embeddings

Use text embeddings to convert text into meaningful vector representations. You input a body of text 
and the output is a (1 x n) vector. You can use embedding vectors for a wide variety of applications. 
Bedrock currently offers one model for text embedding that supports text similarity (finding the 
semantic similarity between bodies of text) and text retrieval (such as search).
For the text embeddings model, the input text size is 512 tokens and the output vector length is 4096.
To use a text embeddings model, use the InvokeModel API operation or the Python SDK.
Use InvokeModel to retrieve the vector representation of the input text from the specified model.

At the time of writing you can only use `amazon.titan-e1t-medium` as embedding model via the API.

#### Input

```json
{
 "inputText": ""
}
```

#### Output

```json
{
 "embedding": []
}
```


Let's see how to generate embeddings of some text:

In [None]:
prompt_data = "Amazon Bedrock supports foundation models from industry-leading providers such as \
AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving your unique goals."

In [None]:
body = json.dumps({"inputText": prompt_data})
modelId = "amazon.titan-e1t-medium" # change this to use a different version from the model provider
accept = "application/json"
contentType = "application/json"

response = boto3_bedrock.invoke_model(
 body=body, modelId=modelId, accept=accept, contentType=contentType
)
response_body = json.loads(response.get("body").read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")