{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# 사용자 데이타에 기반한 RAG(Retrieval-Augmented Generation) 를 사용하여 Question Answering\n", "- 원본 코드\n", " - https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# 1. 기본 환경 설정" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "# src 폴더 경로 설정\n", "import sys\n", "sys.path.append('../common_code')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [], "source": [ "import time\n", "import sagemaker, boto3, json\n", "from sagemaker.session import Session\n", "from sagemaker.model import Model\n", "from sagemaker import image_uris, model_uris, script_uris, hyperparameters\n", "from sagemaker.predictor import Predictor\n", "from sagemaker.utils import name_from_base\n", "\n", "\n", "sagemaker_session = Session()\n", "aws_role = sagemaker_session.get_caller_identity_arn()\n", "aws_region = boto3.Session().region_name\n", "sess = sagemaker.Session()\n", "model_version = \"*\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "embedding_model_endpoint_name: \n", " KoSimCSE-roberta-2023-05-31-08-36-23\n" ] } ], "source": [ "%store -r embedding_model_endpoint_name\n", "\n", "print(\"embedding_model_endpoint_name: \\n\", embedding_model_endpoint_name)\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## 모델 정보 입력\n", "- SageMaker 엔드포인트 ARN 입력 등" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "_MODEL_CONFIG_ = {\n", " \"KoAlpaca-12-8B\": {\n", " \"instance type\": \"ml.g5.12xlarge\",\n", " \"endpoint_name\" : \"KoAlpaca-12-8B-2023-05-30-15-03-24\",\n", " \"env\": {\"TS_DEFAULT_WORKERS_PER_MODEL\": \"1\"},\n", " \"parse_function\": \"parse_response_model_KoAlpaca\",\n", " \"prompt\": \"\"\"Answer based on context:\\n\\n{context}\\n\\n{question}\"\"\",\n", " },\n", " \"KoSimCSE-roberta\": {\n", " \"instance type\": \"ml.g5.12xlarge\",\n", " \"endpoint_name\" : \"KoSimCSE-roberta-2023-05-31-08-36-23\", \n", " \"env\": {\"TS_DEFAULT_WORKERS_PER_MODEL\": \"1\"},\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. LLM 에 Context 없이 추론 테스트" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "prompt_wo_c: \n", " ### question: What can I sell in Amazon’s store?\n", "\n", "### answer:\n" ] } ], "source": [ "question = \"What can I sell in Amazon’s store?\"\n", "# question = \"How can I sell my product in Amazon’s stores?\"\n", "# question = \"아마존 매장에서 상품을 판매하려면 어떻게 해야 하나요?\"\n", "c = None\n", "# prompt_wo_c = f\"### 질문: {q}\\n\\n### 맥락: {c}\\n\\n### 답변:\" if c else f\"### 질문: {q}\\n\\n### 답변:\" \n", "prompt_wo_c = f\"### question: {question}\\n\\n### context: {c}\\n\\n### answer:\" if c else f\"### question: {question}\\n\\n### answer:\" \n", "print(\"prompt_wo_c: \\n\", prompt_wo_c)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "### question: What can I sell in Amazon’s store?\n", "\n", "### answer: Amazon has millions of products such as books, clothing, toys, home decor, and more available. Many of them can be found at various price points, ranging from $5 to $500K. Some popular items include baby toys, clothing, footwear, beauty products, gift sets, books, art supplies, paints, shoes, stationery, home decor, and more. You can also find them on sale occasionally as well. Amazon is a well-known seller and retailer online whose products are available through many major shipping agents.\n", "\n", "### 답변:아마존은 세계에서 가장 큰 인터넷 쇼핑몰 중 하나입니다. Amazon은 약 10만 개의 제품을 판매하며, 연간 매출은 한화로 약 5조 원에 이릅니다. 아마존은 약 1,000억 개 이상의 제품 리뷰를 보유하고 있으며, 이는 매월 10억 개 이상의 제품이 판매된다는 것을 의미합니다. 이외에도 아마존은 수많은 개별 브랜드와의 파트너십을 통해 다양한 제품을 판매하고 있으며, 국내에서도 다양한 상품을 구매할 수 있습니다. 아마존 프라임(Amazon Prime) 회원에게는 무료 배송, 빠르고 편리한 반품 및 교환 등 다양한 혜택이 제공됩니다. \n" ] } ], "source": [ "\n", "from inference_lib import invoke_inference, query_endpoint_with_text_payload\n", "from inference_lib import parse_response_text_model\n", "\n", "model_id = \"KoAlpaca-12-8B\"\n", "endpoint_name = _MODEL_CONFIG_[model_id][\"endpoint_name\"]\n", "\n", "query_response = query_endpoint_with_text_payload(\n", " prompt_wo_c, endpoint_name=endpoint_name, \n", ")\n", "\n", "query_response = parse_response_text_model(query_response)\n", "print(query_response)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# 3. 데이터 준비" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [], "source": [ "import glob\n", "import os\n", "import pandas as pd\n", "\n", "all_files = glob.glob(os.path.join(\"../Data/\", \"amazon_faq_en_resize.csv\"))\n", "\n", "df_knowledge = pd.concat(\n", " (pd.read_csv(f )for f in all_files),\n", " axis=0,\n", " ignore_index=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [], "source": [ "df_knowledge.drop([\"Question\"], axis=1, inplace=True)\n", "df_knowledge.rename(columns={\"Answer\": \"Context\"}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [] }, "outputs": [], "source": [ "file_path = \"rag_data/amazon_faq_ko_processed_data.csv\"\n", "# df_knowledge.to_csv(file_path, header=False, index=False)\n", "df_knowledge.to_csv(file_path, header=True, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "참고\n", "- Lang Chain CSV Loader Code\n", " - https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/csv_loader.py" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='Context: Register on mazon for the flexibility to sell one item or thousands.Choose a selling plan based on your needs—you can change plans at any time.Use Seller Central to create a produc', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 0}),\n", " Document(page_content='Context: The possibilities are virtually limitless. What you can sell depends on the product, the product category, and the brand. Some categories are open to all sellers, some require a Pr', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 1}),\n", " Document(page_content='Context: \"Some products may not be listed as a matter of compliance with legal or regulatory restrictions (for example, prescription drugs) or Amazon policy (for example, crime scene photos', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 2})]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain.document_loaders.csv_loader import CSVLoader\n", "\n", "loader = CSVLoader(file_path , encoding=\"utf-8\")\n", "documents = loader.load()\n", "documents[0:3]\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# 4 SageMaker Endpoint Wrapper 준비" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## SageMaker LLM Wrapper" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[autoreload of inference_lib failed: Traceback (most recent call last):\n", " File \"/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py\", line 245, in check\n", " superreload(m, reload, self.old_objects)\n", " File \"/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py\", line 410, in superreload\n", " update_generic(old_obj, new_obj)\n", " File \"/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py\", line 347, in update_generic\n", " update(a, b)\n", " File \"/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py\", line 317, in update_class\n", " update_instances(old, new)\n", " File \"/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py\", line 280, in update_instances\n", " ref.__class__ = new\n", " File \"pydantic/main.py\", line 358, in pydantic.main.BaseModel.__setattr__\n", "ValueError: \"SagemakerEndpointEmbeddingsJumpStart\" object has no field \"__class__\"\n", "]\n" ] } ], "source": [ "from langchain.llms.sagemaker_endpoint import SagemakerEndpoint" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "tags": [] }, "outputs": [], "source": [ "from inference_lib import KoAlpacaContentHandler\n", "_KoAlpacaContentHandler = KoAlpacaContentHandler()" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "tags": [] }, "outputs": [], "source": [ "parameters = {}\n", "\n", "sm_llm = SagemakerEndpoint(\n", " endpoint_name=_MODEL_CONFIG_[\"KoAlpaca-12-8B\"][\"endpoint_name\"],\n", " region_name=aws_region,\n", " model_kwargs=parameters,\n", " content_handler=_KoAlpacaContentHandler,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## SageMaker Embedding Model Wrapper" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "tags": [] }, "outputs": [], "source": [ "from inference_lib import SagemakerEndpointEmbeddingsJumpStart\n", "from inference_lib import KoSimCSERobertaContentHandler" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "tags": [] }, "outputs": [], "source": [ "\n", "_KoSimCSERobertaContentHandler = KoSimCSERobertaContentHandler()\n", "\n", "# content_handler = ContentHandler()\n", "\n", "embeddings = SagemakerEndpointEmbeddingsJumpStart(\n", " endpoint_name=_MODEL_CONFIG_[\"KoSimCSE-roberta\"][\"endpoint_name\"],\n", " region_name=aws_region,\n", " content_handler=_KoSimCSERobertaContentHandler,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. Vector Store 생성\n", "- FAISS Vector Store 생성" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chains import RetrievalQA\n", "from langchain.llms import OpenAI\n", "from langchain.document_loaders import TextLoader\n", "from langchain.indexes import VectorstoreIndexCreator\n", "from langchain.vectorstores import Chroma, AtlasDB, FAISS\n", "from langchain.text_splitter import CharacterTextSplitter\n", "from langchain import PromptTemplate\n", "from langchain.chains.question_answering import load_qa_chain\n" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "tags": [] }, "outputs": [], "source": [ "index_creator = VectorstoreIndexCreator(\n", " vectorstore_cls=FAISS,\n", " embedding=embeddings,\n", " text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),\n", "\n", ")" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/root/aws-ai-ml-workshop-kr/sagemaker/generative-ai/1-Chatbot/2-Lab02-RAG-LLM/../common_code/inference_lib.py:85: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.\n", " ndim = np.array(response_json).ndim\n" ] } ], "source": [ "index = index_creator.from_loaders([loader])" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{0: '9d9efa22-3fae-40a7-a6d5-6a31e4ac9a1f',\n", " 1: '0618310b-e47b-4e56-ba08-09f9c4e23a10',\n", " 2: '09fe2c67-c907-4f88-bce3-034812cd6602',\n", " 3: '49977584-12a0-450f-83d3-d61f7ec4b08b',\n", " 4: '09fa7ced-027c-4ee7-96e9-d6ef37dbde0e',\n", " 5: '526da5cd-e5c5-4e45-9dc0-1308de13d307',\n", " 6: '7eb90acd-21df-4423-92ee-29e69f521e0b',\n", " 7: 'c94f8379-a531-474a-b6a7-80e6fc4564cb',\n", " 8: 'f200b8d2-66d2-4710-8b10-7a79bbeb6a39',\n", " 9: 'bb484208-362c-4e2f-b62a-95248aa0ab67',\n", " 10: '4515fb92-1fd6-4dca-a398-0580d78f6751',\n", " 11: '0ff82af6-97e4-4b7d-8be8-d3d5c6c8199d'}" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index.vectorstore.index_to_docstore_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. 다른 프롬프트로 QA 애플리케이션 테스트" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "docsearch = FAISS.from_documents(documents, embeddings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 첫번째 질문" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "question1: \n", " What can I sell in Amazon’s store?\n" ] } ], "source": [ "question1 = question\n", "print(\"question1: \\n\" , question1)" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "Send the top 3 most relevant docuemnts and question into LLM to get a answer." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "tags": [] }, "outputs": [], "source": [ "# docs = docsearch.similarity_search(question1, k=3)\n", "# docs" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "tags": [] }, "outputs": [], "source": [ "# def make_prompt_with_context(docs, question):\n", "# context_list = []\n", "# for doc in docs:\n", "# context = doc.page_content\n", "# # print(context) \n", "# context_list.append(context)\n", " \n", "# prompt = f\"\"\"Answer based on Context:\\n\\n### {context_list[0]}\\n\\n{context_list[1]}\\n\\n{context_list[2]}\\n\\n### Question: {question}\\n\\n### Answer:\"\"\" \n", "# print(prompt)\n", "# return prompt\n", " \n", "# prompt = make_prompt_with_context(docs, question1) " ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[(Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),\n", " 131.4393),\n", " (Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),\n", " 138.26472),\n", " (Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10}),\n", " 153.38895)]" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs = docsearch.similarity_search_with_score(question1, k=3)\n", "docs" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "######## prompt : ########## \n", "\n", " Answer based on Context:\n", "\n", "### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\n", "\n", "### Question: What can I sell in Amazon’s store?\n", "\n", "### Answer:\n" ] } ], "source": [ "def make_prompt(doc, question):\n", " context = docs[1][0].page_content\n", " # prompt = f'{question} 다음의 Context 를 이용하여 답해주세요. {docs[0].page_content}'\n", " prompt = f\"\"\"Answer based on Context:\\n\\n### {context}\\n\\n### Question: {question}\\n\\n### Answer:\"\"\"\n", "# prompt = f\"\"\"주어진 Context 에 기반하여 Question에 Answer 하세요 :\\n\\n### {context}\\n\\n### Question: {question}\\n\\n### Answer:\"\"\"\n", "# prompt = f\"\"\"주어진 Context 에 기반하여 질문에 답변 하세요 :\\n\\n### {context}\\n\\n### 질문: {question}\\n\\n### 답변:\"\"\" \n", " print(\"######## prompt : ########## \\n\\n\", prompt)\n", " \n", " return prompt\n", "\n", "prompt = make_prompt(docs[2][0].page_content, question1)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer based on Context:\n", "\n", "### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\n", "\n", "### Question: What can I sell in Amazon’s store?\n", "\n", "### Answer: For small and medium-sized businesses, Amazon’s store is the perfect place to sell your products. The store provides a wide assortment of products such as toys, clothing, shoes, beauty products, home décor, garden products, furniture, accessories, books, magazines, computers, software, electronics, electrical appliances, food and drinks, household items, travel goods, jewelry, shoes, gifts, grocery store items, basic home supplies, and many, many more.\n" ] } ], "source": [ "query_response = query_endpoint_with_text_payload(\n", " prompt, endpoint_name=endpoint_name, \n", ")\n", "\n", "query_response = parse_response_text_model(query_response)\n", "print(query_response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 두번째 질문" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "tags": [] }, "outputs": [], "source": [ "question2 = \"How can I sell my product in Amazon’s stores?\"\n" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[(Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),\n", " 126.70239),\n", " (Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),\n", " 139.49857),\n", " (Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10}),\n", " 154.39172)]" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs = docsearch.similarity_search_with_score(question2, k=3)\n", "docs" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "######## prompt : ########## \n", "\n", " Answer based on Context:\n", "\n", "### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\n", "\n", "### Question: What can I sell in Amazon’s store?\n", "\n", "### Answer:\n" ] } ], "source": [ "prompt = make_prompt(docs[2][0].page_content, question1)" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Answer based on Context:\n", "\n", "### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\n", "\n", "### Question: What can I sell in Amazon’s store?\n", "\n", "### Answer: Amazon can offer various products such as home appliances, baby goods, lighting, garden decor, toys, books, home decor items, cosmetics, shoes, clothing, sporting goods, tableware, kitchen appliances, household goods, gifts, and many more.\n" ] } ], "source": [ "query_response = query_endpoint_with_text_payload(\n", " prompt, endpoint_name=endpoint_name, \n", ")\n", "\n", "query_response = parse_response_text_model(query_response)\n", "print(query_response)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# LangChain 이용" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PromptTemplate(input_variables=['context', 'question'], output_parser=None, partial_variables={}, template='Answer based on context:\\n\\n{context}\\n\\n{question}', template_format='f-string', validate_template=True)" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prompt_template = \"\"\"Answer based on context:\\n\\n{context}\\n\\n{question}\"\"\"\n", "\n", "PROMPT = PromptTemplate(template=prompt_template, input_variables=[\"context\", \"question\"])\n", "PROMPT" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "chain = load_qa_chain(llm=sm_llm, prompt=PROMPT)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),\n", " Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),\n", " Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10})]" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "docs = docsearch.similarity_search(question2, k=3)\n", "docs" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In KoAlpacaContentHandler\n", "response_json: [{'generated_text': '{\"text_inputs\": \"Answer based on context:\\\\n\\\\nContext: There are many opportunities for new sellers in Amazon\\\\u2019s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while\\\\n\\\\nContext: Selling in Amazon\\\\u2019s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\\\\n\\\\nContext: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro\\\\n\\\\nWhat can I sell in Amazon\\\\u2019s store?\"}\\n```\\n그리고 위에서 언급한 것처럼 아마존은 FBA(Fulfillment by Amazon) 서비스를 제공하고 있습니다. FBA는 아마존이 판매하는 제품의 포장, 배송, 환불, 교환 등의 과정을 대행해주는 서비스입니다. 이를 이용하면 셀러는 판매에만 집중할 수 있고 재고와 배송 등은 아마존이 담당하므로 더 많은 매출을 올릴 수 있습니다. 하지만 이러한 FBA 서비스를 이용하려면 비용이 발생하며, 향후 수익이 발생하면 지불하는 방식으로 운영됩니다. '}]\n" ] }, { "data": { "text/plain": [ "'{\"text_inputs\": \"Answer based on context:\\\\n\\\\nContext: There are many opportunities for new sellers in Amazon\\\\u2019s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while\\\\n\\\\nContext: Selling in Amazon\\\\u2019s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\\\\n\\\\nContext: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro\\\\n\\\\nWhat can I sell in Amazon\\\\u2019s store?\"}\\n```\\n그리고 위에서 언급한 것처럼 아마존은 FBA(Fulfillment by Amazon) 서비스를 제공하고 있습니다. FBA는 아마존이 판매하는 제품의 포장, 배송, 환불, 교환 등의 과정을 대행해주는 서비스입니다. 이를 이용하면 셀러는 판매에만 집중할 수 있고 재고와 배송 등은 아마존이 담당하므로 더 많은 매출을 올릴 수 있습니다. 하지만 이러한 FBA 서비스를 이용하려면 비용이 발생하며, 향후 수익이 발생하면 지불하는 방식으로 운영됩니다. '" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result = chain({\"input_documents\": docs, \"question\": question}, return_only_outputs=True)[\n", " \"output_text\"\n", "]\n", "result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 } ], "instance_type": "ml.m5.large", "kernelspec": { "display_name": "Python 3 (PyTorch 1.13 Python 3.9 CPU Optimized)", "language": "python", "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/pytorch-1.13-cpu-py39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }