## Lab 2 Test the RAG solution using Kendra and LangChain

***
This notebooks is designed to run on `Python 3 Data Science 3.0` kernel in Amazon SageMaker Studio
***

First, we will install necessary packages and prepare the environment


In [None]:
!pip install "sagemaker==2.163.0" --upgrade --quiet
!pip install ipywidgets==7.0.0 langchain==0.0.224 boto3==1.26.165 --quiet
!pip install faiss-cpu --quiet
!pip install unstructured --quiet

In [None]:
from langchain.retrievers import AmazonKendraRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
import sys
import json
import os
from typing import Any, Dict, List, Optional
import boto3


In [None]:
%%bash
export STACK_NAME=sagemaker-llm-kendra-rag-stack
export KENDRA_INDEX_ID=$(aws cloudformation describe-stacks \
    --stack-name $STACK_NAME \
    --query 'Stacks[0].Outputs[?OutputKey==`KendraIndexID`].OutputValue' --output text)
echo "Kendra Index ID: ${KENDRA_INDEX_ID}"

Please fill in the blow cell with the kendra index that was created using the cloudformation template

In [None]:
region = boto3.Session().region_name
kendra_index_id = "<FILL IN>"
endpoint_name = "falcon-7b-instruct-2xl"

We now can build the conversation chain using LangChain and it's [Kendra retriever function](https://python.langchain.com/docs/modules/data_connection/retrievers/integrations/amazon_kendra_retriever).

In [None]:
model_parameters = {
    "max_new_tokens": 200, 
    "temperature":0.1, 
    "seed":0, 
    "num_beams":1, 
    "return_full_text": False,
    }

def build_chain():

    class ContentHandler(LLMContentHandler):
        content_type = "application/json"
        accepts = "application/json"

        def transform_input(self, prompt: str, model_kwargs: dict) -> bytes:
            input_str = json.dumps({"inputs": prompt, "parameters": {**model_kwargs}})
            return input_str.encode('utf-8')

        def transform_output(self, output: bytes) -> str:
            response_json = json.loads(output.read().decode("utf-8"))
            return response_json[0]["generated_text"]

    content_handler = ContentHandler()

    llm = SagemakerEndpoint(
        endpoint_name=endpoint_name,
        region_name=region,
        model_kwargs=model_parameters,
        content_handler=content_handler
    )

    retriever = AmazonKendraRetriever(index_id=kendra_index_id)

    prompt_template = """
        The following is a friendly conversation between a human and an AI. 
        The AI is talkative and provides lots of specific details from its context.
        If the AI does not know the answer to a question, it truthfully says it does not know.
        {context}
        Instruction: Based on the above documents, provide a detailed answer for {question}. Answer "don't know" 
        if not present in the document. 
        Helpful Answer:"""
    
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
    )


    qa = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        return_source_documents=True,
        combine_docs_chain_kwargs={"prompt": PROMPT})
    return qa

In [None]:
qa_chain = build_chain()
chat_history = []
query = 'what is Amazon SageMaker?'
result = qa_chain({"question": query, "chat_history": chat_history})


In [None]:
result['answer']

execersie 1: change the prompt template from "Helpful Answer" to "Solution" and rerun the same code to see the response from the model

exercise 2: did you notice the response has some repeated sentences? Try to solve this issue. Hint: try to add parameters in the model parameter section