{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Conversational Interface - Chatbot with Claude LLM\n", "\n", "In this notebook, we will build a chatbot using the Foundational Models (FMs) in Amazon Bedrock. For our use-case we use Claude as our FM for building the chatbot.\n", "\n", "Amazon Bedrock currently supports the following Claude models:\n", "| Provider | Model Name | Versions | `id` |\n", "| --- | --- | --- | --- |\n", "| Anthropic | Claude | V1, Instant | `anthropic.claude-v1`, `anthropic.claude-instant-v1` |\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.\n", "\n", "\n", "## Chatbot using Amazon Bedrock\n", "\n", "![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)\n", "\n", "\n", "## Use Cases\n", "\n", "1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model\n", "2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template\n", "3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions\n", "4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.\n", "\n", "## Langchain framework for building Chatbot with Amazon Bedrock\n", "In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.\n", "\n", "LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.\n", "It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.\n", "\n", "## Building Chatbot with Context - Key Elements\n", "\n", "The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using a GPT-J embeddings model for this\n", "\n", "![Embeddings](./images/embeddings_lang.png)\n", "\n", "Second process is the user request orchestration , interaction, invoking and returing the results\n", "\n", "![Chatbot](./images/chatbot_lang.png)\n", "\n", "## Architecture [Context Aware Chatbot]\n", "![4](./images/context-aware-chatbot.png)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### ⚠️⚠️⚠️ Execute the following cells before running this notebook ⚠️⚠️⚠️\n", "\n", "For a detailed description on what the following cells do refer to [Bedrock boto3 setup](../00_Intro/bedrock_boto3_setup.ipynb) notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Make sure you run `download-dependencies.sh` from the root of the repository to download the dependencies before running this cell\n", "%pip install ../dependencies/botocore-1.29.162-py3-none-any.whl ../dependencies/boto3-1.26.162-py3-none-any.whl ../dependencies/awscli-1.27.162-py3-none-any.whl --force-reinstall" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false } }, "source": [ "### Installing the dependencies" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%pip install faiss-cpu==1.7.4 --quiet\n", "%pip install pypdf==3.8.1 --quiet\n", "%pip install langchain==0.0.190 --quiet\n", "%pip install ipywidgets==7.7.0" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#### Un comment the following lines to run from your local environment outside of the AWS account with Bedrock access\n", "\n", "#import os\n", "#os.environ['BEDROCK_ASSUME_ROLE'] = ''\n", "#os.environ['AWS_PROFILE'] = ''" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import json\n", "import os\n", "import sys\n", "\n", "module_path = \"..\"\n", "sys.path.append(os.path.abspath(module_path))\n", "from utils import bedrock, print_ww\n", "\n", "os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'\n", "boto3_bedrock = bedrock.get_bedrock_client(os.environ.get('BEDROCK_ASSUME_ROLE', None))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Chatbot (Basic - without context)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "#### We use [CoversationChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/bedrock.html?highlight=ConversationChain#using-in-a-conversation-chain) from LangChain to start the conversation. We also use the [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html) for storing the messages. We can also get the history as a list of messages (this is very useful in a chat model).\n", "Chatbots needs to remember the previous interactions. Conversational memory allows us to do that. There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.\n", "\n", "**Note:** The model outputs are non-deterministic" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains import ConversationChain\n", "from langchain.memory import ConversationBufferMemory\n", "from langchain import PromptTemplate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2023-06-15T20:35:32.414119Z", "start_time": "2023-06-15T20:35:31.605208Z" }, "collapsed": false }, "outputs": [], "source": [ "from langchain.llms.bedrock import Bedrock\n", "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\", client=boto3_bedrock, model_kwargs={\"max_tokens_to_sample\": 1000})\n", "memory = ConversationBufferMemory()\n", "conversation = ConversationChain(\n", " llm=cl_llm, verbose=True, memory=memory\n", ") \n", "print_ww(conversation.predict(input=\"Hi there!\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "What happens here? We said \"Hi there!\" and the model spat out a several conversations. This is due to the fact that the default prompt used by Langchain ConversationChain is not well designed for Claude. An [effective Claude prompt](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design) should end with `\\n\\nHuman\\n\\nAassistant:`. Let's fix this.\n", "\n", "To learn more about how to write prompts for Claude, check [Anthropic documentation](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Chatbot using prompt template(Langchain)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "LangChain provides several classes and functions to make constructing and working with prompts easy. We are going to use the [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html) class to construct the prompt from a f-string template. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.memory import ConversationBufferMemory\n", "from langchain import PromptTemplate\n", "\n", "# turn verbose to true to see the full logs and documents\n", "conversation= ConversationChain(\n", " llm=cl_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain\n", ")\n", "\n", "# langchain prompts do not always work with all the models. This prompt is tuned for Claude\n", "claude_prompt = PromptTemplate.from_template(\"\"\"The following is a friendly conversation between a human and an AI.\n", "The AI is talkative and provides lots of specific details from its context. If the AI does not know\n", "the answer to a question, it truthfully says it does not know.\n", "\n", "Current conversation:\n", "{history}\n", "\n", "\n", "Human: {input}\n", "\n", "\n", "Assistant:\n", "\"\"\")\n", " \n", "conversation.prompt = claude_prompt\n", "\n", "print_ww(conversation.predict(input=\"Hi there!\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### New Questions\n", "\n", "Model has responded with intial message, let's ask few questions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "print_ww(conversation.predict(input=\"Give me a few tips on how to start a new garden.\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Build on the questions\n", "\n", "Let's ask a question without mentioning the word garden to see if model can understand previous conversation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print_ww(conversation.predict(input=\"Cool. Will that work with tomatoes?\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Finishing this conversation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print_ww(conversation.predict(input=\"That's all, thank you!\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Claude is still really talkative. Try changing the prompt to make Claude provide shorter answers." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Interactive session using ipywidgets\n", "\n", "The following utility class allows us to interact with Claude in a more natural way. We write out question in an input box, and get Claude answer. We can then continue our conversation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ipywidgets as ipw\n", "from IPython.display import display, clear_output\n", "\n", "class ChatUX:\n", " \"\"\" A chat UX using IPWidgets\n", " \"\"\"\n", " def __init__(self, qa, retrievalChain = False):\n", " self.qa = qa\n", " self.name = None\n", " self.b=None\n", " self.retrievalChain = retrievalChain\n", " self.out = ipw.Output()\n", "\n", "\n", " def start_chat(self):\n", " print(\"Starting chat bot\")\n", " display(self.out)\n", " self.chat(None)\n", "\n", "\n", " def chat(self, _):\n", " if self.name is None:\n", " prompt = \"\"\n", " else: \n", " prompt = self.name.value\n", " if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:\n", " print(\"Thank you , that was a nice chat !!\")\n", " return\n", " elif len(prompt) > 0:\n", " with self.out:\n", " thinking = ipw.Label(value=\"Thinking...\")\n", " display(thinking)\n", " try:\n", " if self.retrievalChain:\n", " result = self.qa.run({'question': prompt })\n", " else:\n", " result = self.qa.run({'input': prompt }) #, 'history':chat_history})\n", " except:\n", " result = \"No answer\"\n", " thinking.value=\"\"\n", " print_ww(f\"AI:{result}\")\n", " self.name.disabled = True\n", " self.b.disabled = True\n", " self.name = None\n", " \n", " if self.name is None:\n", " with self.out:\n", " self.name = ipw.Text(description=\"You:\", placeholder='q to quit')\n", " self.b = ipw.Button(description=\"Send\")\n", " self.b.on_click(self.chat)\n", " display(ipw.Box(children=(self.name, self.b)))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's start a chat. You can also test the following questions:\n", "1. tell me a joke\n", "2. tell me another joke\n", "3. what was the first joke about\n", "4. can you make another joke on the same topic of the first joke" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chat = ChatUX(conversation)\n", "chat.start_chat()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Chatbot with persona" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "AI assistant will play the role of a career coach. Role Play Dialogue requires user message to be set in before starting the chat. ConversationBufferMemory is used to pre-populate the dialog" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# store previous interactions using ConversationalBufferMemory and add custom prompts to the chat.\n", "memory = ConversationBufferMemory()\n", "memory.chat_memory.add_user_message(\"You will be acting as a career coach. Your goal is to give career advice to users\")\n", "memory.chat_memory.add_ai_message(\"I am career coach and give career advice\")\n", "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\",client=boto3_bedrock)\n", "conversation = ConversationChain(\n", " llm=cl_llm, verbose=True, memory=memory\n", ")\n", "\n", "conversation.prompt = claude_prompt\n", "\n", "print_ww(conversation.predict(input=\"What are the career options in AI?\"))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print_ww(conversation.predict(input=\"What these people really do? Is it fun?\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "##### Let's ask a question that is not specialty of this Persona and the model shouldn't answer that question and give a reason for that" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "conversation.verbose = False\n", "print_ww(conversation.predict(input=\"How to fix my car?\"))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Chatbot with Context \n", "In this use case we will ask the Chatbot to answer question from some external corpus it has likely never seen before. To do this we apply a pattern called RAG (Retrieval Augmented Generation): the idea is to index the corpus in chunks, then lookup which sections of the corpus might be relevant to provide an answer by using semantic similarity between the chunks and the question. Finally the most relevant chunks are aggregated and passed as context to the ConversationChain, similar to providing an history.\n", "\n", "We will take a csv file and use **Titan Embeddings Model** to create vectors for each line of the csv. This vector is then stored in FAISS, an open source library providing an in-memory vector datastore. When the chatbot is asked a question, we query FAISS with the question and retrieve the text which is semantically closest. This will be our answer. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Titan embeddings Model\n", "\n", "Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.\n", "\n", "Embeddings are for example used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) \n", "\n", "Other possible use for embeddings can be found here. [LangChain Embeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from langchain.embeddings import BedrockEmbeddings\n", "\n", "br_embeddings = BedrockEmbeddings(client=boto3_bedrock)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### FAISS as VectorStore\n", "\n", "In order to be able to use embeddings for search, we need a store that can efficiently perform vector similarity searches. In this notebook we use FAISS, which is an in memory store. For permanently store vectors, one can use pgVector, Pinecone or Chroma.\n", "\n", "The langchain VectorStore Api's are available [here](https://python.langchain.com/en/harrison-docs-refactor-3-24/reference/modules/vectorstore.html)\n", "\n", "To know more about the FAISS vector store please refer to this [document](https://arxiv.org/pdf/1702.08734.pdf)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.document_loaders import CSVLoader\n", "from langchain.text_splitter import CharacterTextSplitter\n", "from langchain.indexes.vectorstore import VectorStoreIndexWrapper\n", "from langchain.vectorstores import FAISS\n", "\n", "s3_path = f\"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv\"\n", "!aws s3 cp $s3_path ./rag_data/Amazon_SageMaker_FAQs.csv\n", "\n", "loader = CSVLoader(\"./rag_data/Amazon_SageMaker_FAQs.csv\") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column\n", "documents_aws = loader.load() #\n", "print(f\"Number of documents={len(documents_aws)}\")\n", "\n", "docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator=\",\").split_documents(documents_aws)\n", "\n", "print(f\"Number of documents after split and chunking={len(docs)}\")\n", "\n", "vectorstore_faiss_aws = FAISS.from_documents(\n", " documents=docs,\n", " embedding = br_embeddings\n", ")\n", "\n", "print(f\"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::\")\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Semantic search\n", "\n", "We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a RetrievalQA chain." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss_aws)\n", "print_ww(wrapper_store_faiss.query(\"R in SageMaker\", llm=cl_llm))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how the semantic search works:\n", "1. First we calculate the embeddings vector for the query, and\n", "2. then we use this vector to do a similarity search on the store" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "v = br_embeddings.embed_query(\"R in SageMaker\")\n", "print(v[0:10])\n", "results = vectorstore_faiss_aws.similarity_search_by_vector(v, k=4)\n", "for r in results:\n", " print_ww(r.page_content)\n", " print('----')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Memory\n", "In any chatbot we will need a QA Chain with various options which are customized by the use case. But in a chatbot we will always need to keep the history of the conversation so the model can take it into consideration to provide the answer. In this example we use the [ConversationalRetrievalChain](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) from LangChain, together with a ConversationBufferMemory to keep the history of the conversation.\n", "\n", "Source: https://python.langchain.com/docs/modules/chains/popular/chat_vector_db\n", "\n", "Set `verbose` to `True` to see all the what is going on behind the scenes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "\n", "from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT\n", "\n", "print_ww(CONDENSE_QUESTION_PROMPT.template)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Parameters used for ConversationRetrievalChain\n", "* **retriever**: We used `VectorStoreRetriever`, which is backed by a `VectorStore`. To retrieve text, there are two search types you can choose: `\"similarity\"` or `\"mmr\"`. `search_type=\"similarity\"` uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.\n", "\n", "* **memory**: Memory Chain to store the history \n", "\n", "* **condense_question_prompt**: Given a question from the user, we use the previous conversation and that question to make up a standalone question\n", "\n", "* **chain_type**: If the chat history is long and doesn't fit the context you use this parameter and the options are `stuff`, `refine`, `map_reduce`, `map-rerank`\n", "\n", "If the question asked is outside the scope of context, then the model will reply it doesn't know the answer\n", "\n", "**Note**: if you are curious how the chain works, uncomment the `verbose=True` line." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# turn verbose to true to see the full logs and documents\n", "from langchain.chains import ConversationalRetrievalChain\n", "from langchain.memory import ConversationBufferMemory\n", "\n", "memory_chain = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n", "qa = ConversationalRetrievalChain.from_llm(\n", " llm=cl_llm, \n", " retriever=vectorstore_faiss_aws.as_retriever(), \n", " memory=memory_chain,\n", " condense_question_prompt=CONDENSE_QUESTION_PROMPT,\n", " #verbose=True, \n", " chain_type='stuff', # 'refine',\n", " #max_tokens_limit=300\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's chat! ask the chatbot some questions about SageMaker, like:\n", "1. What is SageMaker?\n", "2. What is canvas?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chat = ChatUX(qa, retrievalChain=True)\n", "chat.start_chat()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Your mileage might vary, but after 2 or 3 questions you will start to get some weird answers. In some cases, even in other languages.\n", "This is happening for the same reasons outlined at the beginning of this notebook: the default langchain prompts are not optimal for Claude. \n", "In the following cell we are going to set two new prompts: one for the question rephrasing, and one to get the answer from that rephrased question." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "# turn verbose to true to see the full logs and documents\n", "from langchain.chains import ConversationalRetrievalChain\n", "from langchain.schema import BaseMessage\n", "\n", "\n", "# We are also providing a different chat history retriever which outputs the history as a Claude chat (ie including the \\n\\n)\n", "_ROLE_MAP = {\"human\": \"\\n\\nHuman: \", \"ai\": \"\\n\\nAssistant: \"}\n", "def _get_chat_history(chat_history):\n", " buffer = \"\"\n", " for dialogue_turn in chat_history:\n", " if isinstance(dialogue_turn, BaseMessage):\n", " role_prefix = _ROLE_MAP.get(dialogue_turn.type, f\"{dialogue_turn.type}: \")\n", " buffer += f\"\\n{role_prefix}{dialogue_turn.content}\"\n", " elif isinstance(dialogue_turn, tuple):\n", " human = \"\\n\\nHuman: \" + dialogue_turn[0]\n", " ai = \"\\n\\nAssistant: \" + dialogue_turn[1]\n", " buffer += \"\\n\" + \"\\n\".join([human, ai])\n", " else:\n", " raise ValueError(\n", " f\"Unsupported chat history format: {type(dialogue_turn)}.\"\n", " f\" Full chat history: {chat_history} \"\n", " )\n", " return buffer\n", "\n", "# the condense prompt for Claude\n", "condense_prompt_claude = PromptTemplate.from_template(\"\"\"{chat_history}\n", "\n", "Answer only with the new question.\n", "\n", "\n", "Human: How would you ask the question considering the previous conversation: {question}\n", "\n", "\n", "Assistant: Question:\"\"\")\n", "\n", "# recreate the Claude LLM with more tokens to sample - this provide longer responses but introduces some latency\n", "cl_llm = Bedrock(model_id=\"anthropic.claude-v1\", client=boto3_bedrock, model_kwargs={\"max_tokens_to_sample\": 500})\n", "memory_chain = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n", "qa = ConversationalRetrievalChain.from_llm(\n", " llm=cl_llm, \n", " retriever=vectorstore_faiss_aws.as_retriever(), \n", " #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={\"k\": 8}),\n", " memory=memory_chain,\n", " get_chat_history=_get_chat_history,\n", " #verbose=True,\n", " condense_question_prompt=condense_prompt_claude, \n", " chain_type='stuff', # 'refine',\n", " #max_tokens_limit=300\n", ")\n", "\n", "# the LLMChain prompt to get the answer. the ConversationalRetrievalChange does not expose this parameter in the constructor\n", "qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template(\"\"\"\n", "{context}\n", "\n", "\n", "Human: Use at maximum 3 sentences to answer the question inside the XML tags. \n", "\n", "{question}\n", "\n", "Do not use any XML tags in the answer. If the answer is not in the context say \"Sorry, I don't know as the answer was not found in the context\"\n", "\n", "Assistant:\"\"\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's start another chat. Feel free to ask the following questions:\n", "\n", "1. What is SageMaker?\n", "2. what is canvas?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chat = ChatUX(qa, retrievalChain=True)\n", "chat.start_chat()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### Do some prompt engineering\n", "\n", "You can \"tune\" your prompt to get more or less verbose answers. For example, try to change the number of sentences, or remove that instruction all-together. You might also need to change the number of `max_tokens_to_sample` (eg 1000 or 2000) to get the full answer." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### In this demo we used Claude LLM to create conversational interface with following patterns:\n", "\n", "1. Chatbot (Basic - without context)\n", "\n", "2. Chatbot using prompt template(Langchain)\n", "\n", "3. Chatbot with personas\n", "\n", "4. Chatbot with context" ] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }