# Retrieval Augmented Generation and Chatbot Application

LangChain is a framework for developing applications powered by language models. The key aspects of this framework allow us to augement the Large Models and enable us to perform tasks which meet our goals and enable our use-cases. At a high level Langchain has 

Data: Connect a language model to other sources of data
Agent: Allow a language model to interact with its environment

LangChain can be used in two major ways:

<li>Indivisual Components: LangChain provides modular abstractions for the components neccessary to work with language models. LangChain also has collections of implementations for all these abstractions. The components are designed to be easy to use, regardless of whether you are using the rest of the LangChain framework or not.

<li>Use-Case Specific Chains: Chains can be thought of as assembling these components in particular ways in order to best accomplish a particular use case. These are intended to be a higher level interface through which people can easily get started with a specific use case. These chains are also designed to be customizable.

## Topics covered:

In this notebook we will be covering the below topics:

- **LLM** Examine running an LLM in bare form to check for output
- **Vector DB** Examine various vector databases like FAISS or CHROMA and leverage to produce better results using RAG
- **Prompt template** Examine use of PROMPT Template
- **Question Answering** Retrieval Augmented Generation (RAG)
- **Chatbot** Build a Interactive Chatbot with Memory 

## Key points for consideration

1. Long Document that exceed the token limit? Ability to Chain , Mapo_reduce, Refine, Map-Rerank
2. Cost of per token -- minimize the tokens and send in only relevant tokens to Model
3. Which model to use --
    - Cohere, AI21, Huggingface Hub, Manifest, Goose AI, Writer, Banana, Modal, StochasticAI, Cerebrium, Petals, Forefront AI, Anthropic, DeepInfra, and self-hosted Models.
    - Example LLM cohere = Cohere(model='command-xlarge')
    - Example LLM flan = HuggingFaceHub(repo_id="google/flan-t5-xl")
4. Input Data Sources PDF, WebPages, CSV , S3, EFS
5. Orchestration with External Tasks
    - External Tasks - Agent SerpApi, SEARCH Engines
    - Math Calculator
6. Conversation Management and History

### Key components of LangChain

Let us examine the key components of Langchain. At the heart and the center is the Large Model.

There are several main modules that LangChain provides support for. For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides. These modules are, in increasing order of complexity:

**Models**: The various model types and model integrations LangChain supports.

<img src='./images/models.png' width ="300"/>

    
**Prompts**: This includes prompt management, prompt optimization, and prompt serialization.
    
<img src="images/prompt.png" width="300"/>
    
**Memory**: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

    
**Indexes**: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that.
    
<img src="images/vectorstore.png" width="300"/>

**Chains**: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

<img src="images/chains.png" width="300"/>

**Agents**: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.


    
**Callbacks**: It can be difficult to track all that occurs inside a chain or agent. Callbacks help add a level of observability and introspection.
 
    

### Chat Bot key elements

The first process in a chat bot is to generate embeddings. Typically you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this

<img src="images/Embeddings_lang.png" width="300"/>

Second process is the user request orchestration , interaction,  invoking and returing the results

<img src="images/Chatbot_lang.png" width="300"/>

For processes which need deeper analysis, conversation history we will need to summarize every interaction to keep it succinct and for that we can follow this flow below which uses PineCone as an example

For the various Tools which are available 

<img src="images/chatbot_internet.jpg" width="300"/>

# Pre-Requisites

There are a few pre-reqs to be completed when running this notebook. The key one being setting up the LLM to be used.
<li> Either have a FLAN-T5 model deployed in SageMaker using Lab5 at  at Deploy FlanT5-XXL from https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai/llm-workshop
<li> Have Anthropic Model Key . You can choose to do both or either or . However certains cells might not work if you have just 1 and so you can choose to ignore those errors as part of the run



### LLM model deploy in SageMaker

Make sure that you have ran the Notebook `1_deploy-flan-t5-xl.ipynb`.


In [7]:
%store -r endpoint_name

In [8]:
import os
os.environ["FLAN_XL_ENDPOINT"]=endpoint_name
print(os.environ["FLAN_XL_ENDPOINT"])

huggingface-text2text-flan-t5-xl-1686836752
huggingface-text2text-flan-t5-xl-1686836752


In [9]:
!apt update

Hit:1 http://deb.debian.org/debian buster InRelease
Hit:2 http://deb.debian.org/debian buster-updates InRelease
Get:3 http://security.debian.org/debian-security buster/updates InRelease [34.8 kB]
Get:4 http://security.debian.org/debian-security buster/updates/main amd64 Packages [515 kB]
Fetched 550 kB in 0s (1541 kB/s) [0m[33m[33m
Reading package lists... Done
Building dependency tree       
Reading state information... Done
74 packages can be upgraded. Run 'apt list --upgradable' to see them.


In [10]:
!apt install wkhtmltopdf -y

Reading package lists... Done
Building dependency tree       
Reading state information... Done
wkhtmltopdf is already the newest version (0.12.5-1+deb10u1).
0 upgraded, 0 newly installed, 0 to remove and 74 not upgraded.


### Install certain libraries which are needed for this run. 

These are provided in the requirements.txt or you can run these cells to fine control which libraries you need

In [11]:
!pip install --upgrade pip

[0m

In [12]:
!pip install langchain==0.0.161 --quiet

[0m

In [13]:
# !pip install chromadb==0.3.21 --quiet

In [14]:
!pip install langchain==0.0.161 boto3 html2text jinja2 --quiet

[0m

In [15]:
!pip install faiss-cpu==1.7.4 --quiet

[0m

In [16]:
!pip install pypdf==3.8.1 --quiet

[0m

In [17]:
!pip install transformers==4.24.0 --quiet

[0m

In [18]:
!pip install sentence_transformers==2.2.2 --quiet

[0m

In [19]:
!pip install pdfkit

[0m

In [20]:
import sentence_transformers 
sentence_transformers.__version__

'2.2.2'

In [21]:
print("all libraries installed")

all libraries installed


### Import statements for our chain and indexers. We are not using any explicit agent here

In [22]:
#from aws_langchain.kendra_index_retriever import KendraIndexRetriever
from langchain.chains import ConversationalRetrievalChain
from langchain import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import ContentHandlerBase
from langchain.prompts import PromptTemplate
import sys
import json
import os
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

In [23]:
import sagemaker
import boto3
import jinja2
role = sagemaker.get_execution_role()  # execution role for the endpoint

### [Optional] -  Deploy a GPT-J embeddings Model - so we can use that to generate the embeddings for the documents

This section requires a bigger instance type `ml.g5.24xlarge` which is not available in the workshop setting. If you are running in your own account and have access to `ml.g5.24xlarge`, you can uncomment the below code to deploy the GPTJ model for using it as an embeddings model. 

This will be used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) and needs a g5.24xlarge instance to run

Other Embeddings posible are here. [LangChain Embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html)

In [24]:
# _MODEL_CONFIG_ = {
#     "huggingface-textembedding-gpt-j-6b": {
#         "instance type": "ml.g5.24xlarge",
#         "env": {"TS_DEFAULT_WORKERS_PER_MODEL": "1"},
#     },
# }
# # - Uncomment and set these values in case you have an instance of GPT-J deployed already 
# model_id = "huggingface-textembedding-gpt-j-6b"
# # _MODEL_CONFIG_[model_id]["endpoint_name"] = '<endpoint_name>'  
# # print( f'24xlarge::{_MODEL_CONFIG_[model_id]["endpoint_name"]}')
# #

In [25]:
# newline, bold, unbold = "\n", "\033[1m", "\033[0m"

# for model_id in _MODEL_CONFIG_:
#     endpoint_name = name_from_base(f"jumpstart-example-embedding-{model_id}")
#     inference_instance_type = _MODEL_CONFIG_[model_id]["instance type"]

#     # Retrieve the inference container uri. This is the base HuggingFace container image for the default model above.
#     deploy_image_uri = image_uris.retrieve(
#         region=None,
#         framework=None,  # automatically inferred from model_id
#         image_scope="inference",
#         model_id=model_id,
#         model_version=model_version,
#         instance_type=inference_instance_type,
#     )
#     # Retrieve the model uri.
#     model_uri = model_uris.retrieve(
#         model_id=model_id, model_version=model_version, model_scope="inference"
#     )
#     model_inference = Model(
#         image_uri=deploy_image_uri,
#         model_data=model_uri,
#         role=role,
#         predictor_cls=Predictor,
#         name=endpoint_name,
#         env=_MODEL_CONFIG_[model_id]["env"],
#     )
#     model_predictor_inference = model_inference.deploy(
#         initial_instance_count=1,
#         instance_type=inference_instance_type,
#         predictor_cls=Predictor,
#         endpoint_name=endpoint_name,
#     )
#     print(f"{bold}Model {model_id} has been deployed successfully.{unbold}{newline}")
#     _MODEL_CONFIG_[model_id]["endpoint_name"] = endpoint_name

In [26]:
# from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler, SagemakerEndpointEmbeddings
# from langchain.embeddings.base import Embeddings
# from langchain.llms.sagemaker_endpoint import ContentHandlerBase
# import numpy as np
# import boto3
# import os

# class SagemakerEndpointEmbeddingsLMI(SagemakerEndpointEmbeddings):
#     def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
#         """Compute doc embeddings using a SageMaker Inference Endpoint.

#         Args:
#             texts: The list of texts to embed.
#             chunk_size: The chunk size defines how many input texts will
#                 be grouped together as request. If None, will use the
#                 chunk size specified by the class.

#         Returns:
#             List of embeddings, one for each text.
#         """
#         results = []
#         _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

#         for i in range(0, len(texts), _chunk_size):
#             response = self._embedding_func(texts[i : i + _chunk_size])
#             print()
#             results.extend(response)
#         return results


# class ContentHandlerEmbdSM(EmbeddingsContentHandler): #ContentHandlerBase):
#     content_type = "application/json"
#     accepts = "application/json"

#     def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
#         input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
#         #input_str = json.dumps({"inputs": prompt, "parameters": model_kwargs})
#         return input_str.encode("utf-8")

#     def transform_output(self, output: bytes) -> str:
#         response_json = json.loads(output.read().decode("utf-8"))
#         #print(f"EMBEDDINGS::RESPONSE:{response_json}::")
#         embeddings = response_json["embedding"]
#         print(f"EMBEDDINGS::RESPONSE::len[0]:{len(embeddings[0])}::current shape -- > {np.array(embeddings).shape}:: shape after unsqueeze -- > {np.array([embeddings]).shape}")
#         if len(embeddings) == 1: # for the query embeddings - should be 1D vector because faiss will unsqueeze it 
#             print(f"EMBEDDINGS::returning:NO:SQUEEZE:: RESPONSE:{np.array(embeddings).shape}::")
#             return embeddings #[0]
#         return embeddings # embeddings expected to be of shape 2D List[List[float]] -- >array 1 row with n dimensions


# assumed_role = os.getenv('LANGCHAIN_ASSUME_ROLE', None)
# print(assumed_role)
# boto3_kwargs = {}
# session = boto3.Session()
# if assumed_role:
#     sts = session.client("sts")
#     response = sts.assume_role(
#         RoleArn=str(assumed_role), #"arn:aws:iam::425576326687:role/SageMakerStudioDomainNoAuth-SageMakerExecutionRole-3RBLN6GPZ46O",
#         RoleSessionName="langchain-llm-1"
#     )
#     print(response)
#     boto3_kwargs = dict(
#         aws_access_key_id=response['Credentials']['AccessKeyId'],
#         aws_secret_access_key=response['Credentials']['SecretAccessKey'],
#         aws_session_token=response['Credentials']['SessionToken']
#     )

# boto3_sm_client = boto3.client(
#     "sagemaker-runtime",
#     **boto3_kwargs
# )
# print(boto3_sm_client)
# content_handler_embd_sm = ContentHandlerEmbdSM()
# hf_embeddings = SagemakerEndpointEmbeddingsLMI(
#     client = boto3_sm_client,
#     endpoint_name=_MODEL_CONFIG_["huggingface-textembedding-gpt-j-6b"]["endpoint_name"], #os.environ["FLAN_XXL_ENDPOINT"],
#     region_name='us-east-1',
#     content_handler=content_handler_embd_sm,
# )
# hf_embeddings

### Use HuggingFaceEmbeddings in the workshop setting. 
If you are in a workshop, please use the below code. If you are using GPTJ model for generating the embeddings, please comment the below cell. 

In [27]:
from langchain.embeddings import HuggingFaceEmbeddings
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Extra, Field
from langchain.embeddings.base import Embeddings
import numpy as np

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}


class CustomHFEmbeddings(HuggingFaceEmbeddings):
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Compute doc embeddings using a HuggingFace transformer model.

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        texts = list(map(lambda x: x.replace("\n", " "), texts))
        embeddings = self.client.encode(texts, **self.encode_kwargs)
        #- (22, 1536)
        print(f"CustomHFEmbeddings::embed_documents::shape:returned -- > {embeddings.shape}:")
        
        return embeddings.tolist()
    def embed_query(self, text: str) -> List[float]:
            """Compute query embeddings using a HuggingFace transformer model.

            Args:
                text: The text to embed.

            Returns:
                Embeddings for the text.
            """
            text = text.replace("\n", " ")
            embedding = self.client.encode(text, **self.encode_kwargs)
            print(f"CustomHFEmbeddings::QUERY::shape:returned -- > {embedding.shape}:")
            return embedding.tolist()

hf_embeddings = CustomHFEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

### Test the flanT5 model 
Testing Flan T5 model for answering a random question.

In [28]:
MAX_LENGTH = 256
NUM_RETURN_SEQUENCES = 1
TOP_K = 0
TOP_P = 0.7
DO_SAMPLE = True 

In [29]:
boto3_kwargs = {}
session = boto3.Session()

boto3_sm_client = boto3.client("sagemaker-runtime")
print(boto3_sm_client)
prompt = f"Answer this question below, How can it help me? "
print(f"Question being asked is -- > {prompt}:")

payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}

payload = json.dumps(payload).encode('utf-8')

boto3_sm_client.invoke_endpoint(
    EndpointName=os.environ["FLAN_XL_ENDPOINT"],
    Body=payload,
    ContentType="application/json",
)["Body"].read().decode("utf8")

<botocore.client.SageMakerRuntime object at 0x7fed249b13d0>
Question being asked is -- > Answer this question below, How can it help me? :


'{"generated_texts": ["The iron will be used to raise the metal plate, which will be heated and the tin will melt."]}'

## Section 2: Use LangChain

We will follow this pattern for the rest of the section

<li>Exploring vector databases
<li>Basics of QA exploring simple chains
<li>Basics of chatbot
<li>Going to prompt templates,
<li>Exploring Chains


### Exploring Vector DataBases and Create the Embeddings. 

Leverage SageMaker GPT-J model or the same

#### Use the file based document to retrieve based on embeddings

Run the below to visualize the Dataset

#### Pull in the data set

In [30]:
original_data = "s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/"

!mkdir -p rag_data
!aws s3 cp --recursive $original_data rag_data

download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to rag_data/Amazon_SageMaker_FAQs.csv


In [31]:
import glob
import os
import pandas as pd

all_files = glob.glob(os.path.join("rag_data/", "*.csv"))

df_knowledge = pd.concat(
    (pd.read_csv(f, header=None, names=["Question", "Answer"]) for f in all_files),
    axis=0,
    ignore_index=True,
)

#- drop 
df_answer = df_knowledge.drop(["Question"], axis=1)

print(df_knowledge.shape)
df_knowledge.head(2)

(154, 2)


Unnamed: 0,Question,Answer
0,What is Amazon SageMaker?,Amazon SageMaker is a fully managed service to...
1,In which Regions is Amazon SageMaker available...,For a list of the supported Amazon SageMaker A...


In [32]:
#convert to pdf
import pdfkit
pdfkit.from_url('https://aws.amazon.com/sagemaker/faqs/', 'rag_data/Amazon_SageMaker_FAQs.pdf')

True

In [33]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders.csv_loader import CSVLoader

In [34]:
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

#### Create the embeddings for document search

In [35]:
from langchain.indexes import VectorstoreIndexCreator

#### Vector store indexer. 

This is what stores and matches the embeddings.This notebook showcases Chroma and FAISS and will be transient and in memory. The VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)

We will use our own Custom implementation of SageMaker Embeddings which needs a reference to the SageMaker endpoint to call the model which will return the embeddings. This will be used by the FAISS or Chroma to store in memory and be used when ever the User runs a query

#### Use LangChain to leverage a SageMaker LLM 

Let's break down the above VectorstoreIndexCreator and see what's happening under the hood. Furthermore, we will see how to incorporate a customize prompt rather than using a default prompt with VectorstoreIndexCreator.

Firstly, we generate embedings for each of document in the knowledge library with SageMaker  embedding model.


In [36]:
from langchain.llms.sagemaker_endpoint import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
import ast

parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": False,
    "temperature": 1,
}
MAX_CHARACTER_TRUNCATION=10000 # at 20k it produced garbage results

class ContentHandlerSMLMI(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        #input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        print(f"ContentHandlerSMLMI::LangChain:::LEN:input_str={len(prompt)}:: will truncate if > {MAX_CHARACTER_TRUNCATION}::")
        if len(prompt) > MAX_CHARACTER_TRUNCATION:
            prompt=prompt[:MAX_CHARACTER_TRUNCATION]
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        #print(f"ContentHandlerSMLMI::LangChain:::LEN:input_str={len(input_str)}::")
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json_dict = json.loads(output.read().decode("utf-8"))
        print(f"ContentHandlerSMLMI::LangChain::output={response_json_dict}:")
        return response_json_dict[list(response_json_dict.keys())[0]] [0]


content_handler_sm_llm = ContentHandlerSMLMI()
session = boto3.Session()
boto3_sm_client = boto3.client(
    "sagemaker-runtime"
    # **boto3_kwargs
)
print(boto3_sm_client)


sm_llm = SagemakerEndpoint(
    client = boto3_sm_client,
    endpoint_name=os.environ["FLAN_XL_ENDPOINT"],
    region_name='us-east-1',
    model_kwargs=parameters,
    content_handler=content_handler_sm_llm,
)

print(f"SageMaker LLM created at {sm_llm}::")

<botocore.client.SageMakerRuntime object at 0x7fed251a8b20>
SageMaker LLM created at [1mSagemakerEndpoint[0m
Params: {'endpoint_name': 'huggingface-text2text-flan-t5-xl-1686836752', 'model_kwargs': {'max_length': 200, 'num_return_sequences': 1, 'top_k': 250, 'top_p': 0.95, 'do_sample': False, 'temperature': 1}}::


#### Load the Data from our Documents Source. 

Then we will feed this into the VectorStore to create the embeddings using the loaders like [here](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/directory_loader.html). First we will try with the SageMaker FAQ PDF document and also the IRS PDF files

we will create 3 Loaders and 3 documents after doing a split on them. 1st loader for amazon faq, 2nd for some of the IRS PDF's, 3rd just for  some ramdom example. For text it will be just a separate loader, text loader vs pdf

In [37]:
from langchain.document_loaders import TextLoader
from langchain.document_loaders.csv_loader import CSVLoader

from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("rag_data/Amazon_SageMaker_FAQs.pdf")
documents_aws = loader.load() # -- gives 2 docs
documents_split = loader.load_and_split() # - gives 22 docs

vectorstore_faiss_aws = FAISS.from_documents(
    CharacterTextSplitter(chunk_size=300, chunk_overlap=0).split_documents(documents_aws), 
    hf_embeddings, 
    #k=1
    #**k_args
)#### VectorStore as FAISS 

You can read up about [FAISS](https://arxiv.org/pdf/1702.08734.pdf) in memory vector store here. However for our example it will be the same 

Chroma

[Chroma](https://www.trychroma.com/) is a super simple vector search database. The core-API consists of just four functions, allowing users to build an in-memory document-vector store. By default Chroma uses the Hugging Face transformers library to vectorize documents.

Weaviate

[Weaviate](https://github.com/weaviate/weaviate) is a very posh looking tool - not only does Weaviate offer a GraphQL API with support for vector search. It also allows users to vectorize their content using Weaviate's inbuilt modules or custom modules.

In [38]:
%%time
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import TextLoader
from langchain.document_loaders.csv_loader import CSVLoader

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma, AtlasDB, FAISS

from langchain.document_loaders import PyPDFLoader
import glob
import os
import pandas as pd
from langchain.document_loaders import DirectoryLoader

from langchain.indexes import VectorstoreIndexCreator
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

k_args = {"k": 1}
# - sub_docs = self.text_splitter.split_documents(docs)
# - create Vectorstore
vectorstore_faiss_aws = FAISS.from_documents(
    CharacterTextSplitter(chunk_size=300, chunk_overlap=0).split_documents(documents_aws), 
    hf_embeddings, 
    #k=1
    #**k_args
)

wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)

CustomHFEmbeddings::embed_documents::shape:returned -- > (1, 768):
CPU times: user 131 ms, sys: 8.07 ms, total: 139 ms
Wall time: 86.8 ms


#### First way of running the Query. High Level abstraction

Leverage VectorStoreIndexCreator which wraps around the RetrievalQA and provides a high level API abstraction to generate the response. This is a wrapper around the underlying API's which we will explore below

In [39]:
#query="Simplified method for business use of home deduction"
query="What is SageMaker Spot Instances"

In [40]:
wrapper_store_faiss.query(question="What is Amazon SageMaker Managed Spot Instances?",llm=sm_llm)

CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=264:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['a cloud computing service']}:


'a cloud computing service'

##### Visualize Manually what is going on 


First we get the relevant documents based on the query by using the embeddings using the LLM summarize the outputs. These docs can be fed into the LLM to summarize and predict the answer. Here we can specify search type 'similiarity or Relevant' and K param

In [41]:
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)
result_docs = wrapper_store_faiss.query_with_sources(
    question="What is Amazon SageMaker Managed Spot Instances?",
    llm=sm_llm,
    chain_type="stuff"
)
result_docs

# - or you can use similiarity scores
retriever = vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 3})
relevant_docs = retriever.get_relevant_documents(query)   
print(len(relevant_docs))

CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=6250:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['      ']}:
CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
1


##### As a quick Test -- to do it manually, now invoke the LLM endpoint and feed the docs along with the query

The results still will not come close to the answer we are expecting

In [42]:
prompt = f"Summarize this {relevant_docs} "
print(f"Question being asked is -- > {query}:")
payload = {'text_inputs': prompt, 
           'max_length': MAX_LENGTH, 
           'num_return_sequences': NUM_RETURN_SEQUENCES,
           'top_k': TOP_K,
           'top_p': TOP_P,
           'do_sample': DO_SAMPLE}
payload = json.dumps(payload).encode('utf-8')
boto3_sm_client.invoke_endpoint(
    EndpointName=os.environ["FLAN_XL_ENDPOINT"],
    Body=payload,
    ContentType="application/json",
)["Body"].read().decode("utf8")

Question being asked is -- > What is SageMaker Spot Instances:


'{"generated_texts": ["Amazon SageMaker FAQs"]}'

## Exploring Chains and Prompt templates
IN this section we will look at the cvarious flavors of chains and prompt templates


#### Define a Chain

[Chains](https://python.langchain.com/en/harrison-docs-refactor-3-24/modules/chains.html)  are the key to having a conversation in a chatbot manner. Here we will test **MANUALLY** injecting the documents retrived by doing a similiarity search. The final result matches our previous results in any case

**Simplest QA Chain with NO CONTEXT being passed.**

#### PromptTemplate 

This can be enhanced by using a prompt template. More details  [PROMPT Template](https://python.langchain.com/en/harrison-docs-refactor-3-24/modules/prompts/prompt_templates.html)  

We will start with a simple Chain and build up from there.



In [43]:
# - assume a chat bot asks a question
from langchain.prompts import PromptTemplate
prompt_template = """
  The following is a friendly conversation between a human and an AI. 
  The AI is talkative and provides lots of specific details from its context.
  If the AI does not know the answer to a question, it truthfully says it 
  does not know.
  {context}
  Instruction: Based on the above documents, provide a detailed answer for, {question} Answer "don't know" if not present in the document. Solution:
  """
PROMPT_T = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
PROMPT_T

PromptTemplate(input_variables=['context', 'question'], output_parser=None, partial_variables={}, template='\n  The following is a friendly conversation between a human and an AI. \n  The AI is talkative and provides lots of specific details from its context.\n  If the AI does not know the answer to a question, it truthfully says it \n  does not know.\n  {context}\n  Instruction: Based on the above documents, provide a detailed answer for, {question} Answer "don\'t know" if not present in the document. Solution:\n  ', template_format='f-string', validate_template=True)

In [44]:
%%time
## -- Load and run the Chain based on the prompt
query="What is Amazon Managed SageMaker Spot Instances?"

# - increasing he search to 8 relevant documents works for the GPT-J embeddings model
relevant_docs = vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 3}).get_relevant_documents(query)   
print(len(relevant_docs))
chain = load_qa_chain(llm=sm_llm, prompt=PROMPT_T)
result = chain({"input_documents": relevant_docs, "question": query}, return_only_outputs=True)
result


CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
1
ContentHandlerSMLMI::LangChain:::LEN:input_str=458:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ["don't know"]}:
CPU times: user 115 ms, sys: 11.9 ms, total: 127 ms
Wall time: 434 ms


{'output_text': "don't know"}

#####  LLM Chain is another flavour for a simple chain. In reality you will be using a combination of few different chains as we will see in the chatbot section

In [45]:
%%time
from langchain.chains import LLMChain

query="What is Amazon SageMaker Managed Spot Instances?"
chain_t = LLMChain(llm=sm_llm, prompt=PROMPT_T)
## -- Invoke the Chain ( call LLM ) to generate the Response
result = chain_t({"context": relevant_docs, "question": query}, return_only_outputs=True)
print(query)
result['text']

ContentHandlerSMLMI::LangChain:::LEN:input_str=563:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ["don't know"]}:
What is Amazon SageMaker Managed Spot Instances?
CPU times: user 1.76 ms, sys: 3.86 ms, total: 5.62 ms
Wall time: 374 ms


"don't know"

#### With LangChain we do not need to manage this explictly and starting point is a RetrievalQA chain 
RetrievalQA chain which uses the load_qa_chain under the hood and here we retrieve the most relevant chunk of text and feed those into the language model. Below shows how it works. In most situations we will be using the complex chains by using the Chain module to get the results based on the query by the user. We use the RetrievalQA and pass in the Vector Store to get the same results

However the results do not yet match our expectations

In [46]:
qa = RetrievalQA.from_chain_type(
    llm=sm_llm, 
    chain_type="stuff", 
    retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 3})
    # - k of 8 brings 32k chars which is more than what our LLM can handle
)

#query="Simplified method for business use of home deduction"
query="What is Amazon SageMaker Managed Spot Instances?"
result = qa.run(query)
result

CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=264:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['a cloud computing service']}:


'a cloud computing service'

#### Retrieval QA Chain

You will see better results with `VectorRun` using the QA chain 

In [47]:
qa_prompt = RetrievalQA.from_chain_type(
    llm=sm_llm, 
    chain_type="stuff", 
    retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 3})
)
#query = "Which instances can I use with Managed Spot Training in SageMaker?"
result = qa_prompt.run(query)
result

CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=264:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['a cloud computing service']}:


'a cloud computing service'

## Chatbot application

#### For the chatbot we need `context management, history, vector stores, and many other things`. We will start by with a ConversationalRetrievalChain

This uses conversation memory and RetrievalQAChain which Allow for passing in chat history which can be used for follow up questions.Source: https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html

Set verbose to True to see all the what is going on behind the scenes

**We use Custom Prompt template to fine tune the output responses**

In [51]:
from langchain import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.chains import LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT


def create_prompt_template():
    _template = """
    Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you do not know, do not try to make up an answer.
        Chat History:
        {chat_history}
        Follow Up Input: {question}
        Standalone question:
    """
    CONVO_QUESTION_PROMPT = PromptTemplate.from_template(_template)
    return CONVO_QUESTION_PROMPT
memory_chain = ConversationBufferMemory(memory_key="chat_history", input_key="question", return_messages=True)
chat_history=[]
qa = ConversationalRetrievalChain.from_llm(
    llm=sm_llm, 
    #retriever=vectorstore_faiss_aws.as_retriever(), 
    retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 3}),
    memory=memory_chain,
    #verbose=True,
    condense_question_prompt=create_prompt_template(), #CONDENSE_QUESTION_PROMPT, # use the condense prompt template
    #chain_type='map_reduce',
    max_tokens_limit=100
    #combine_docs_chain_kwargs=key_chain_args,

)
print("Starting chat bot")
input_str = ['Enter your query, q to quit']
while True:
    query = input(str(input_str))
    if 'q' == query or 'quit' == query or 'Q' == query:
        print("Breaking")
        break
    else:
        result = qa.run({'question':query, 'chat_history':chat_history} )
        input_str.append(f"Question:{query}\nAI:Answer:{result}")

print("Thank you , that was a nice chat !!")

Starting chat bot


['Enter your query, q to quit'] what is Amazon SageMaker? 


CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=242:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['a machine learning platform']}:


['Enter your query, q to quit', 'Question:what is Amazon SageMaker? \nAI:Answer:a machine learning platform'] Can I use it for training models? 


ContentHandlerSMLMI::LangChain:::LEN:input_str=487:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['What is Amazon SageMaker?']}:
CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=241:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['a machine learning platform']}:


['Enter your query, q to quit', 'Question:what is Amazon SageMaker? \nAI:Answer:a machine learning platform', 'Question:Can I use it for training models? \nAI:Answer:a machine learning platform'] quit


Breaking
Thank you , that was a nice chat !!


#### Refine as Chain type with no similiarity searches

In [50]:
from langchain import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.chains import LLMChain
from langchain.chains.question_answering import load_qa_chain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT


def create_prompt_template():
    

    _template = """
    Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you do not know, do not try to make up an answer.
        Chat History:
        {chat_history}
        Follow Up Input: {question}
        Standalone question:
    """
    CONVO_QUESTION_PROMPT = PromptTemplate.from_template(_template)
    return CONVO_QUESTION_PROMPT
memory_chain = ConversationBufferMemory(memory_key="chat_history", input_key="question", return_messages=True)
chat_history=[]
qa = ConversationalRetrievalChain.from_llm(
        llm=sm_llm, 
        retriever=vectorstore_faiss_aws.as_retriever(), 
        #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 2}),
        memory=memory_chain,
        #verbose=True,
        condense_question_prompt=create_prompt_template(), #CONDENSE_QUESTION_PROMPT, create_prompt_template(), # use the condense prompt template
        chain_type='refine', #'map_rerank', #'refine', # s(['stuff', 'map_reduce', 'refine', 'map_rerank'])
        max_tokens_limit=100,
        get_chat_history=lambda h : h,
)  
print("Starting Refine chat bot")
input_str = ['Enter your query, q to quit']
while True:
    query = input(str(input_str))
    if 'q' == query or 'quit' == query or 'Q' == query:
        print("Breaking")
        break
    else:
        result = qa.run({'question':query, 'chat_history':chat_history} )
        input_str.append(f"Question:{query}\nAI:Answer:{result}")

print("Thank you , that was a nice chat !!")


Starting Refine chat bot


['Enter your query, q to quit'] What is Amazon SageMaker? 


CustomHFEmbeddings::QUERY::shape:returned -- > (768,):
ContentHandlerSMLMI::LangChain:::LEN:input_str=203:: will truncate if > 10000::
ContentHandlerSMLMI::LangChain::output={'generated_texts': ['                                                                                                   ']}:


['Enter your query, q to quit', 'Question:What is Amazon SageMaker? \nAI:Answer:                                                                                                   '] quit


Breaking
Thank you , that was a nice chat !!
