# 사용자 데이타에 기반한 RAG(Retrieval-Augmented Generation) 를 사용하여 Question Answering
- 원본 코드
 - https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb

# 1. 기본 환경 설정

In [2]:
%load_ext autoreload
%autoreload 2

# src 폴더 경로 설정
import sys
sys.path.append('../common_code')

In [3]:
import time
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base


sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
model_version = "*"

In [4]:
%store -r embedding_model_endpoint_name

print("embedding_model_endpoint_name: \n", embedding_model_endpoint_name)


embedding_model_endpoint_name: 
 KoSimCSE-roberta-2023-05-31-08-36-23


## 모델 정보 입력
- SageMaker 엔드포인트 ARN 입력 등

In [5]:
_MODEL_CONFIG_ = {
 "KoAlpaca-12-8B": {
 "instance type": "ml.g5.12xlarge",
 "endpoint_name" : "KoAlpaca-12-8B-2023-05-30-15-03-24",
 "env": {"TS_DEFAULT_WORKERS_PER_MODEL": "1"},
 "parse_function": "parse_response_model_KoAlpaca",
 "prompt": """Answer based on context:\n\n{context}\n\n{question}""",
 },
 "KoSimCSE-roberta": {
 "instance type": "ml.g5.12xlarge",
 "endpoint_name" : "KoSimCSE-roberta-2023-05-31-08-36-23", 
 "env": {"TS_DEFAULT_WORKERS_PER_MODEL": "1"},
 },
}

# 2. LLM 에 Context 없이 추론 테스트

In [6]:
question = "What can I sell in Amazon’s store?"
# question = "How can I sell my product in Amazon’s stores?"
# question = "아마존 매장에서 상품을 판매하려면 어떻게 해야 하나요?"
c = None
# prompt_wo_c = f"### 질문: {q}\n\n### 맥락: {c}\n\n### 답변:" if c else f"### 질문: {q}\n\n### 답변:" 
prompt_wo_c = f"### question: {question}\n\n### context: {c}\n\n### answer:" if c else f"### question: {question}\n\n### answer:" 
print("prompt_wo_c: \n", prompt_wo_c)

prompt_wo_c: 
 ### question: What can I sell in Amazon’s store?

### answer:


In [7]:

from inference_lib import invoke_inference, query_endpoint_with_text_payload
from inference_lib import parse_response_text_model

model_id = "KoAlpaca-12-8B"
endpoint_name = _MODEL_CONFIG_[model_id]["endpoint_name"]

query_response = query_endpoint_with_text_payload(
 prompt_wo_c, endpoint_name=endpoint_name, 
)

query_response = parse_response_text_model(query_response)
print(query_response)

### question: What can I sell in Amazon’s store?

### answer: Amazon has millions of products such as books, clothing, toys, home decor, and more available. Many of them can be found at various price points, ranging from $5 to $500K. Some popular items include baby toys, clothing, footwear, beauty products, gift sets, books, art supplies, paints, shoes, stationery, home decor, and more. You can also find them on sale occasionally as well. Amazon is a well-known seller and retailer online whose products are available through many major shipping agents.

### 답변:아마존은 세계에서 가장 큰 인터넷 쇼핑몰 중 하나입니다. Amazon은 약 10만 개의 제품을 판매하며, 연간 매출은 한화로 약 5조 원에 이릅니다. 아마존은 약 1,000억 개 이상의 제품 리뷰를 보유하고 있으며, 이는 매월 10억 개 이상의 제품이 판매된다는 것을 의미합니다. 이외에도 아마존은 수많은 개별 브랜드와의 파트너십을 통해 다양한 제품을 판매하고 있으며, 국내에서도 다양한 상품을 구매할 수 있습니다. 아마존 프라임(Amazon Prime) 회원에게는 무료 배송, 빠르고 편리한 반품 및 교환 등 다양한 혜택이 제공됩니다. 


# 3. 데이터 준비

In [8]:
import glob
import os
import pandas as pd

all_files = glob.glob(os.path.join("../Data/", "amazon_faq_en_resize.csv"))

df_knowledge = pd.concat(
 (pd.read_csv(f )for f in all_files),
 axis=0,
 ignore_index=True,
)

In [9]:
df_knowledge.drop(["Question"], axis=1, inplace=True)
df_knowledge.rename(columns={"Answer": "Context"}, inplace=True)

In [10]:
file_path = "rag_data/amazon_faq_ko_processed_data.csv"
# df_knowledge.to_csv(file_path, header=False, index=False)
df_knowledge.to_csv(file_path, header=True, index=False)

참고
- Lang Chain CSV Loader Code
 - https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/csv_loader.py

In [11]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path , encoding="utf-8")
documents = loader.load()
documents[0:3]


[Document(page_content='Context: Register on mazon for the flexibility to sell one item or thousands.Choose a selling plan based on your needs—you can change plans at any time.Use Seller Central to create a produc', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 0}),
 Document(page_content='Context: The possibilities are virtually limitless. What you can sell depends on the product, the product category, and the brand. Some categories are open to all sellers, some require a Pr', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 1}),
 Document(page_content='Context: "Some products may not be listed as a matter of compliance with legal or regulatory restrictions (for example, prescription drugs) or Amazon policy (for example, crime scene photos', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 2})]

# 4 SageMaker Endpoint Wrapper 준비

## SageMaker LLM Wrapper

In [71]:
from langchain.llms.sagemaker_endpoint import SagemakerEndpoint

[autoreload of inference_lib failed: Traceback (most recent call last):
 File "/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py", line 245, in check
 superreload(m, reload, self.old_objects)
 File "/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py", line 410, in superreload
 update_generic(old_obj, new_obj)
 File "/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py", line 347, in update_generic
 update(a, b)
 File "/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py", line 317, in update_class
 update_instances(old, new)
 File "/opt/conda/lib/python3.9/site-packages/IPython/extensions/autoreload.py", line 280, in update_instances
 ref.__class__ = new
 File "pydantic/main.py", line 358, in pydantic.main.BaseModel.__setattr__
ValueError: "SagemakerEndpointEmbeddingsJumpStart" object has no field "__class__"
]


In [72]:
from inference_lib import KoAlpacaContentHandler
_KoAlpacaContentHandler = KoAlpacaContentHandler()

In [73]:
parameters = {}

sm_llm = SagemakerEndpoint(
 endpoint_name=_MODEL_CONFIG_["KoAlpaca-12-8B"]["endpoint_name"],
 region_name=aws_region,
 model_kwargs=parameters,
 content_handler=_KoAlpacaContentHandler,
)

## SageMaker Embedding Model Wrapper

In [74]:
from inference_lib import SagemakerEndpointEmbeddingsJumpStart
from inference_lib import KoSimCSERobertaContentHandler

In [75]:

_KoSimCSERobertaContentHandler = KoSimCSERobertaContentHandler()

# content_handler = ContentHandler()

embeddings = SagemakerEndpointEmbeddingsJumpStart(
 endpoint_name=_MODEL_CONFIG_["KoSimCSE-roberta"]["endpoint_name"],
 region_name=aws_region,
 content_handler=_KoSimCSERobertaContentHandler,
)

# 5. Vector Store 생성
- FAISS Vector Store 생성

In [76]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain


In [77]:
index_creator = VectorstoreIndexCreator(
 vectorstore_cls=FAISS,
 embedding=embeddings,
 text_splitter=CharacterTextSplitter(chunk_size=300, chunk_overlap=0),

)

In [78]:
index = index_creator.from_loaders([loader])

 ndim = np.array(response_json).ndim


In [79]:
index.vectorstore.index_to_docstore_id

{0: '9d9efa22-3fae-40a7-a6d5-6a31e4ac9a1f',
 1: '0618310b-e47b-4e56-ba08-09f9c4e23a10',
 2: '09fe2c67-c907-4f88-bce3-034812cd6602',
 3: '49977584-12a0-450f-83d3-d61f7ec4b08b',
 4: '09fa7ced-027c-4ee7-96e9-d6ef37dbde0e',
 5: '526da5cd-e5c5-4e45-9dc0-1308de13d307',
 6: '7eb90acd-21df-4423-92ee-29e69f521e0b',
 7: 'c94f8379-a531-474a-b6a7-80e6fc4564cb',
 8: 'f200b8d2-66d2-4710-8b10-7a79bbeb6a39',
 9: 'bb484208-362c-4e2f-b62a-95248aa0ab67',
 10: '4515fb92-1fd6-4dca-a398-0580d78f6751',
 11: '0ff82af6-97e4-4b7d-8be8-d3d5c6c8199d'}

# 6. 다른 프롬프트로 QA 애플리케이션 테스트

In [80]:
docsearch = FAISS.from_documents(documents, embeddings)

## 첫번째 질문

In [81]:
question1 = question
print("question1: \n" , question1)

question1: 
 What can I sell in Amazon’s store?


Send the top 3 most relevant docuemnts and question into LLM to get a answer.

In [82]:
# docs = docsearch.similarity_search(question1, k=3)
# docs

In [83]:
# def make_prompt_with_context(docs, question):
# context_list = []
# for doc in docs:
# context = doc.page_content
# # print(context) 
# context_list.append(context)
 
# prompt = f"""Answer based on Context:\n\n### {context_list[0]}\n\n{context_list[1]}\n\n{context_list[2]}\n\n### Question: {question}\n\n### Answer:""" 
# print(prompt)
# return prompt
 
# prompt = make_prompt_with_context(docs, question1) 

In [84]:
docs = docsearch.similarity_search_with_score(question1, k=3)
docs

[(Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),
 131.4393),
 (Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),
 138.26472),
 (Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10}),
 153.38895)]

In [85]:
def make_prompt(doc, question):
 context = docs[1][0].page_content
 # prompt = f'{question} 다음의 Context 를 이용하여 답해주세요. {docs[0].page_content}'
 prompt = f"""Answer based on Context:\n\n### {context}\n\n### Question: {question}\n\n### Answer:"""
# prompt = f"""주어진 Context 에 기반하여 Question에 Answer 하세요 :\n\n### {context}\n\n### Question: {question}\n\n### Answer:"""
# prompt = f"""주어진 Context 에 기반하여 질문에 답변 하세요 :\n\n### {context}\n\n### 질문: {question}\n\n### 답변:""" 
 print("######## prompt : ########## \n\n", prompt)
 
 return prompt

prompt = make_prompt(docs[2][0].page_content, question1)

######## prompt : ########## 

 Answer based on Context:

### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e

### Question: What can I sell in Amazon’s store?

### Answer:


In [86]:
query_response = query_endpoint_with_text_payload(
 prompt, endpoint_name=endpoint_name, 
)

query_response = parse_response_text_model(query_response)
print(query_response)

Answer based on Context:

### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e

### Question: What can I sell in Amazon’s store?

### Answer: For small and medium-sized businesses, Amazon’s store is the perfect place to sell your products. The store provides a wide assortment of products such as toys, clothing, shoes, beauty products, home décor, garden products, furniture, accessories, books, magazines, computers, software, electronics, electrical appliances, food and drinks, household items, travel goods, jewelry, shoes, gifts, grocery store items, basic home supplies, and many, many more.


## 두번째 질문

In [87]:
question2 = "How can I sell my product in Amazon’s stores?"


In [88]:
docs = docsearch.similarity_search_with_score(question2, k=3)
docs

[(Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),
 126.70239),
 (Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),
 139.49857),
 (Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10}),
 154.39172)]

In [89]:
prompt = make_prompt(docs[2][0].page_content, question1)

######## prompt : ########## 

 Answer based on Context:

### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e

### Question: What can I sell in Amazon’s store?

### Answer:


In [90]:
query_response = query_endpoint_with_text_payload(
 prompt, endpoint_name=endpoint_name, 
)

query_response = parse_response_text_model(query_response)
print(query_response)

Answer based on Context:

### Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e

### Question: What can I sell in Amazon’s store?

### Answer: Amazon can offer various products such as home appliances, baby goods, lighting, garden decor, toys, books, home decor items, cosmetics, shoes, clothing, sporting goods, tableware, kitchen appliances, household goods, gifts, and many more.


# LangChain 이용

In [91]:
prompt_template = """Answer based on context:\n\n{context}\n\n{question}"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
PROMPT

PromptTemplate(input_variables=['context', 'question'], output_parser=None, partial_variables={}, template='Answer based on context:\n\n{context}\n\n{question}', template_format='f-string', validate_template=True)

In [92]:
chain = load_qa_chain(llm=sm_llm, prompt=PROMPT)

In [93]:
docs = docsearch.similarity_search(question2, k=3)
docs

[Document(page_content='Context: There are many opportunities for new sellers in Amazon’s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 4}),
 Document(page_content='Context: Selling in Amazon’s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 6}),
 Document(page_content='Context: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro', metadata={'source': 'rag_data/amazon_faq_ko_processed_data.csv', 'row': 10})]

In [94]:
result = chain({"input_documents": docs, "question": question}, return_only_outputs=True)[
 "output_text"
]
result

In KoAlpacaContentHandler
response_json: [{'generated_text': '{"text_inputs": "Answer based on context:\\n\\nContext: There are many opportunities for new sellers in Amazon\\u2019s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while\\n\\nContext: Selling in Amazon\\u2019s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\\n\\nContext: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro\\n\\nWhat can I sell in Amazon\\u2019s store?"}\n```\n그리고 위에서 언급한 것처럼 아마존은 FBA(Fulfillment by Amazon) 서비스를 제공하고 있습니다. FBA는 아마존이 판매하는 제품의 포장, 배송, 환불, 교환 등의 과정을 대행해주는 서비스입니다. 이를 이용하면 셀러는 판매에만 집중할 수 있고 재고와 배송 등은 아마존이 담당하므로 더 많은 매출을 올릴 수 있습니다. 하지만 이러한 FBA 서비스를 이용하려면 비용이 발생하며, 향후 수익이 발생하면 지불하는 방식으로 운영됩니

'{"text_inputs": "Answer based on context:\\n\\nContext: There are many opportunities for new sellers in Amazon\\u2019s store. What you can sell depends on the product, the category, and the brand. Some categories are open to all sellers while\\n\\nContext: Selling in Amazon\\u2019s store can be very profitable. On average, American small- and medium-sized businesses (SMBs) sell more than 6,500 products per minute. In 2019, nearly 225,000 e\\n\\nContext: Once you create a selling account, submit an application to join Amazon Handmade. If you are approved, you will have the ability to create a store and list products through the Pro\\n\\nWhat can I sell in Amazon\\u2019s store?"}\n```\n그리고 위에서 언급한 것처럼 아마존은 FBA(Fulfillment by Amazon) 서비스를 제공하고 있습니다. FBA는 아마존이 판매하는 제품의 포장, 배송, 환불, 교환 등의 과정을 대행해주는 서비스입니다. 이를 이용하면 셀러는 판매에만 집중할 수 있고 재고와 배송 등은 아마존이 담당하므로 더 많은 매출을 올릴 수 있습니다. 하지만 이러한 FBA 서비스를 이용하려면 비용이 발생하며, 향후 수익이 발생하면 지불하는 방식으로 운영됩니다. '