# Conversational Interface - Chatbot with Titan LLM

In this notebook, we will build a chatbot using the Foundational Models (FMs) in Amazon Bedrock. For our use-case we use Titan as our FM for building the chatbot.

Amazon Bedrock currently supports the following Claude models:
| Provider | Model Name | Versions | `id` |
| --- | --- | --- | --- |
| Amazon | Titan Text | Large | `amazon.titan-tg1-large` |
|

## Overview

Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.


## Chatbot using Amazon Bedrock

![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)


## Use Cases

1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model
2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template
3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions
4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.

## Langchain framework for building Chatbot with Amazon Bedrock
In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.

LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.
It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.

## Building Chatbot with Context - Key Elements

The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this

![Embeddings](./images/embeddings_lang.png)

Second process is the user request orchestration , interaction,  invoking and returing the results

![Chatbot](./images/chatbot_lang.png)

## Architecture [Context Aware Chatbot]
![4](./images/context-aware-chatbot.png)

#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️

For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook.

In [None]:
# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell
%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall

### Installing the dependencies

In [None]:
%pip install faiss-cpu==1.7.4 --quiet
%pip install pypdf==3.8.1 --quiet
%pip install langchain==0.0.190 --quiet
%pip install ipywidgets==7.7.0

In [None]:
#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access

#import os
#os.environ['BEDROCK_ASSUME_ROLE'] = '<YOUR_VALUES>'
#os.environ['AWS_PROFILE'] = '<YOUR_VALUES>'

In [None]:
import boto3
import json
import os
import sys

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'
boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

## Chatbot (Basic - without context)

#### Using CoversationChain from LangChain to start the conversation
Chatbots needs to remember the previous interactions. Conversational memory allows us to do that.There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.

Note: The model outputs are non-deterministic

In [None]:
from langchain.llms.bedrock import Bedrock

titan_llm = Bedrock(model_id="amazon.titan-tg1-large", client=boto3_bedrock)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=titan_llm, verbose=True, memory=memory
)

print_ww(conversation.predict(input="Hi there!"))

#### New Questions

Model has responded with intial message, let's ask few questions

In [None]:
print_ww(conversation.predict(input="Give me a few tips on how to start a new garden."))

#### Build on the questions

Let's ask a question without mentioning the word garden to see if model can understand previous conversation

In [None]:
print_ww(conversation.predict(input="Cool. Will that work with tomatoes?"))

#### Finishing this conversation

In [None]:
print_ww(conversation.predict(input="That's all, thank you!"))

## Chatbot using prompt template(Langchain)

PromptTemplate is responsible for the construction of this input. LangChain provides several classes and functions to make constructing and working with prompts easy. We will use the default Prompt Template here. [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html)

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain import PromptTemplate

chat_history = []

# turn verbose to true to see the full logs and documents
qa= ConversationChain(
    llm=titan_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain
)

print(f"ChatBot:DEFAULT:PROMPT:TEMPLATE: is ={qa.prompt.template}")

In [None]:
import ipywidgets as ipw
from IPython.display import display, clear_output

class ChatUX:
    """ A chat UX using IPWidgets
    """
    def __init__(self, qa, retrievalChain = False):
        self.qa = qa
        self.name = None
        self.b=None
        self.retrievalChain = retrievalChain
        self.out = ipw.Output()


    def start_chat(self):
        print("Starting chat bot")
        display(self.out)
        self.chat(None)


    def chat(self, _):
        if self.name is None:
            prompt = ""
        else: 
            prompt = self.name.value
        if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:
            print("Thank you , that was a nice chat !!")
            return
        elif len(prompt) > 0:
            with self.out:
                thinking = ipw.Label(value="Thinking...")
                display(thinking)
                try:
                    if self.retrievalChain:
                        result = self.qa.run({'question': prompt })
                    else:
                        result = self.qa.run({'input': prompt }) #, 'history':chat_history})
                except:
                    result = "No answer"
                thinking.value=""
                print_ww(f"AI:{result}")
                self.name.disabled = True
                self.b.disabled = True
                self.name = None
            
        if self.name is None:
            with self.out:
                self.name = ipw.Text(description="You:", placeholder='q to quit')
                self.b = ipw.Button(description="Send")
                self.b.on_click(self.chat)
                display(ipw.Box(children=(self.name, self.b)))

Let's start a chat

In [None]:
chat = ChatUX(qa)
chat.start_chat()

## Chatbot with persona

AI assistant will play the role of a career coach. Role Play Dialogue requires user message to be set in before starting the chat. ConversationBufferMemory is used to pre-populate the dialog

In [None]:
memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("You will be acting as a career coach. Your goal is to give career advice to users")
memory.chat_memory.add_ai_message("I am career coach and give career advice")
titan_llm = Bedrock(model_id="amazon.titan-tg1-large",client=boto3_bedrock)
conversation = ConversationChain(
     llm=titan_llm, verbose=True, memory=memory
)

print_ww(conversation.predict(input="What are the career options in AI?"))

##### Let's ask a question that is not specaility of this Persona and the model shouldnn't answer that question and give a reason for that

In [None]:
conversation.verbose = False
print_ww(conversation.predict(input="How to fix my car?"))

## Chatbot with Context 
In this use case we will ask the Chatbot to answer question from the context that it was passed. We will take a csv file and use Titan embeddings Model to create the vector. This vector is stored in FAISS. When chatbot is asked a question we pass this vector and retrieve the answer. 

#### Use a Titan embeddings Model - so we can use that to generate the embeddings for the documents

Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.


This will be used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) 

Other Embeddings posible are here. [LangChain Embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html)

In [None]:
from langchain.embeddings import BedrockEmbeddings
from langchain.vectorstores import FAISS
from langchain import PromptTemplate

br_embeddings = BedrockEmbeddings(client=boto3_bedrock)

#### Create the embeddings for document search

#### Vector store indexer. 

This is what stores and matches the embeddings.This notebook showcases Chroma and FAISS and will be transient and in memory. The VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)

We will use our own Custom implementation of SageMaker Embeddings which needs a reference to the SageMaker endpoint to call the model which will return the embeddings. This will be used by the FAISS or Chroma to store in memory and be used when ever the User runs a query

#### VectorStore as FAISS 

You can read up about [FAISS](https://arxiv.org/pdf/1702.08734.pdf) in memory vector store here. However for our example it will be the same 

Chroma

[Chroma](https://www.trychroma.com/) is a super simple vector search database. The core-API consists of just four functions, allowing users to build an in-memory document-vector store. By default Chroma uses the Hugging Face transformers library to vectorize documents.

Weaviate

[Weaviate](https://github.com/weaviate/weaviate) is a very posh looking tool - not only does Weaviate offer a GraphQL API with support for vector search. It also allows users to vectorize their content using Weaviate's inbuilt modules or custom modules.

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

s3_path = f"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv"
!aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv

loader = CSVLoader("./rag_data/Amazon_SageMaker_FAQs.csv") # --- > 219 docs with 400 chars
documents_aws = loader.load() #
print(f"documents:loaded:size={len(documents_aws)}")

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=",").split_documents(documents_aws)

print(f"Documents:after split and chunking size={len(docs)}")

vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
    embedding = br_embeddings, 
    #**k_args
)

print(f"vectorstore_faiss_aws:created={vectorstore_faiss_aws}::")


#### To run a quick low code test 

We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a QA Chain with all default values

In [None]:
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)
print_ww(wrapper_store_faiss.query("R in SageMaker", llm=titan_llm))

#### Chatbot application

For the chatbot we need context management, history, vector stores, and many other things. We will start by with a ConversationalRetrievalChain

This uses conversation memory and RetrievalQAChain which Allow for passing in chat history which can be used for follow up questions.Source: https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html

Set verbose to True to see all the what is going on behind the scenes

In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT


def create_prompt_template():
    _template = """{chat_history}

Answer only with the new question.
How would you ask the question considering the previous conversation: {question}
Question:"""
    CONVO_QUESTION_PROMPT = PromptTemplate.from_template(_template)
    return CONVO_QUESTION_PROMPT

memory_chain = ConversationBufferMemory(memory_key="chat_history", input_key="question", return_messages=True)
chat_history=[]

#### Parameters used for ConversationRetrievalChain
retriever: We used VectoreStoreRetriver, which is backed by a VectorStore. To retrieve text, there are two search types you can choose: search_type: “similarity” or “mmr”. search_type="similarity" uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.

memory: Memory Chain to store the history 

condense_question_prompt: Given a question from the user, we use the previous conversation and that question to make up a standalone question

chain_type: If the chat history is long and doesn't fit the context you use this parameter and the options are "stuff", "refine", "map_reduce", "map-rerank"

Note: If the question asked is outside the scope of context passed then the model will reply it doesn't know the answer

In [None]:
# turn verbose to true to see the full logs and documents
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.chains import ConversationalRetrievalChain
qa = ConversationalRetrievalChain.from_llm(
    llm=titan_llm, 
    retriever=vectorstore_faiss_aws.as_retriever(), 
    #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 8}),
    memory=memory_chain,
    #verbose=True,
    #condense_question_prompt=CONDENSE_QUESTION_PROMPT, # create_prompt_template(), 
    chain_type='stuff', # 'refine',
    #max_tokens_limit=100
)

qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template("""
{context}

Use at maximum 3 sentences to answer the question inside the <q></q> XML tags. 

<q>{question}</q>

Do not use any XML tags in the answer. If the answer is not in the context say "Sorry, I don't know, as the answer was not found in the context."

Answer:""")

Let's start a chat

In [None]:
chat = ChatUX(qa, retrievalChain=True)
chat.start_chat()

### In this demo we used Titan LLM to create conversational interface with following patterns:

1. Chatbot (Basic - without context)

2. Chatbot using prompt template(Langchain)

3. Chatbot with personas

4. Chatbot with context