# LangChain Indexes Getting Started
- https://python.langchain.com/en/latest/modules/indexes/getting_started.html

# 1. 입력 데이터 확인

 722  6469 39027 data/state_of_the_union.txt
 - 722 lines
 - 6469 words
 - 39027 characters

In [69]:
! wc data/state_of_the_union.txt

  722  6469 39027 data/state_of_the_union.txt


In [65]:
! head -n3 data/state_of_the_union.txt

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 


In [66]:
! tail -n3 data/state_of_the_union.txt

The United States of America. 

May God bless you all. May God protect our troops.

# 2. OpenAI LLM

In [13]:
# import os
# os.environ["OPENAI_API_KEY"]='<Type API Key>' 


In [14]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

In [15]:
from langchain.document_loaders import TextLoader
loader = TextLoader('data/state_of_the_union.txt', encoding='utf8')

In [16]:
from langchain.indexes import VectorstoreIndexCreator

In [18]:
index = VectorstoreIndexCreator().from_loaders([loader])

In [19]:
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

In [20]:
query = "What did the president say about Ketanji Brown Jackson"
index.query_with_sources(query)

{'question': 'What did the president say about Ketanji Brown Jackson',
 'answer': " The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, one of the nation's top legal minds, to continue Justice Breyer's legacy of excellence, and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n",
 'sources': 'data/state_of_the_union.txt'}

In [21]:
index.vectorstore

<langchain.vectorstores.chroma.Chroma at 0x7fd321ca9eb0>

In [22]:
index.vectorstore.as_retriever()

VectorStoreRetriever(vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x7fd321ca9eb0>, search_type='similarity', search_kwargs={})

# 3. Walkthrough

## 문서 파일 로딩
- 1000 개의 Characters 를 1개의 Chunk 로 해서 42 개 생성 함.

In [23]:
documents = loader.load()

In [24]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

In [34]:
print("# of textx: " , len(texts))

# of textx:  42


In [37]:
print("first text: \n" , texts[0:2])

first text: 
 [Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', meta

## OpenAI text-embedding-ada-002 모델 로딩

In [38]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [40]:
embeddings

OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=-1, openai_api_key=None, chunk_size=1000, max_retries=6)

In [42]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(texts, embeddings)

In [43]:
db

<langchain.vectorstores.chroma.Chroma at 0x7fd3223f5be0>

## text-davinci-003 (GPT3.5) 모델 로딩

In [44]:
retriever = db.as_retriever()

In [45]:
llm=OpenAI()

In [47]:
llm

OpenAI(cache=None, verbose=False, callback_manager=<langchain.callbacks.shared.SharedCallbackManager object at 0x7fd3394a7fa0>, client=<class 'openai.api_resources.completion.Completion'>, model_name='text-davinci-003', temperature=0.7, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0, n=1, best_of=1, model_kwargs={}, openai_api_key=None, batch_size=20, request_timeout=None, logit_bias={}, max_retries=6, streaming=False)

## 추정: Chaing 을 통하여 "질문" --> VectorStore() 유사 Chunks 추출 --> 프롬프트 생성 --> LLM 질의

In [49]:
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

In [50]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said she is a consensus builder and has received a broad range of support since being nominated."

## VectorstoreIndexCreator 생성
- 위의 과정에 대한 Wrapper

In [51]:
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Chroma, 
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
)

In [54]:
index_creator.embedding

OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', document_model_name='text-embedding-ada-002', query_model_name='text-embedding-ada-002', embedding_ctx_length=-1, openai_api_key=None, chunk_size=1000, max_retries=6)

In [57]:
index = index_creator.from_loaders([loader])

In [58]:
index

VectorStoreIndexWrapper(vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x7fd32235d9d0>)

## 시맨틱 검색
- 질문: "What did the president say about Ketanji Brown Jackson" 
- 4개의 유사 문서 제공

In [73]:
index.vectorstore.similarity_search("What did the president say about Ketanji Brown Jackson")

[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'data/state_of_the_union.txt'}),
 Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of publ

## Question --> Answer 분석 
- 아래 질문 및 답변을 생성

```python
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said she is a consensus builder and has received a broad range of support since being nominated."
```

#### 위의 시맨틱 검색 ( "What did the president say about Ketanji Brown Jackson") 
- 아래에는 시멘틱 검색에 대한 4개의 문서 중에서 첫번째, 두번째 문서에 <font color="red">빨간색</font> 글씨들이 LLM 이 Context 를 추출하여 답변을 생성한 것으로 보임

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals <font color="red">Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.</font>

<font color="red">A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated</font>, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.


### 추가 "living with COVID-19" 시맨틱 검색 예시

In [64]:
index.vectorstore.similarity_search("living with COVID-19")


[Document(page_content='And based on the projections, more of the country will reach that point across the next couple of weeks. \n\nThanks to the progress we have made this past year, COVID-19 need no longer control our lives.  \n\nI know some are talking about “living with COVID-19”. Tonight – I say that we will never just accept living with COVID-19. \n\nWe will continue to combat the virus as we do other diseases. And because this is a virus that mutates and spreads, we will stay on guard. \n\nHere are four common sense steps as we move forward safely.  \n\nFirst, stay protected with vaccines and treatments. We know how incredibly effective vaccines are. If you’re vaccinated and boosted you have the highest degree of protection. \n\nWe will never give up on vaccinating more Americans. Now, I know parents with kids under 5 are eager to see a vaccine authorized for their children. \n\nThe scientists are working hard to get that done and we’ll be ready with plenty of vaccines when the

# 4. FAISS 사용
- 참조 : https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/faiss.html

In [76]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader

In [78]:
db_FAISS = FAISS.from_documents(texts, embeddings)

In [79]:
db_FAISS.index_to_docstore_id

{0: '4187ca80-7460-423a-b370-84a4ab647983',
 1: 'e67fd22c-7288-4b46-8657-3bdb3bc6b9f5',
 2: '969b48d3-89eb-4b23-9b57-41466a47b868',
 3: 'f9c12189-c88f-444f-a9ec-84ef13302117',
 4: '420d8325-2891-467b-9257-d9994fe24ff3',
 5: '82b5cc00-c784-4fdc-a778-ceee4c69ca01',
 6: '0e6a2770-3e5f-4c06-add4-26dc445ca21d',
 7: '03055cdd-0095-4e52-8ce0-59e45af41431',
 8: 'ddcc07d4-0207-4cc7-b08b-5a35e458dc13',
 9: '75c2c84b-7aaf-4442-b073-913ccba68a8e',
 10: 'd9b6f326-1f3a-4266-a75c-3d6633eeceb7',
 11: '7ebf917c-1689-4241-a21e-187b6920ffcc',
 12: 'd990ab83-586d-4a77-880d-dcb3cfb954aa',
 13: '0550832e-7bc7-482e-b6df-9f2bf8110aed',
 14: '6d6e018d-b849-4507-babf-da2a604a3b66',
 15: 'f0e52fa5-7a98-4453-9dda-7052686edac0',
 16: 'c5a789e9-512d-4086-9016-5547cc53e69b',
 17: 'f49b761e-4745-4bcd-9b4b-103ffde21a50',
 18: '5a22d6cf-de54-4a9f-bb93-92811587e17b',
 19: '4a244f2b-0e7c-453f-9c41-881af19f4a8a',
 20: '0c6cdc7b-2146-4c6d-8526-5631de41227b',
 21: '5f0895ae-eeb6-44f7-a9da-17e6b01edf71',
 22: '0f6bd6dd-cdb1-

In [83]:
query = "What did the president say about Ketanji Brown Jackson"
docs = db_FAISS.similarity_search_with_score(query)

In [84]:
docs

[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'data/state_of_the_union.txt'}),
  0.36927557),
 (Document(page_content='A former top litigator in private practice. A former federal public defender. And from

In [88]:
embedding_vector = embeddings.embed_query(query)
print("query: \n", query)
print("embedding_vector length: ", len(embedding_vector))
print("A little bit of embedding_vector : ", embedding_vector[0:5])


query: 
 What did the president say about Ketanji Brown Jackson
embedding_vector length:  1536
A little bit of embedding_vector :  [-0.052522048354148865, -0.00884410459548235, -0.008728805929422379, 0.007738592103123665, 0.0037912996485829353]


In [89]:
docs_and_scores = db.similarity_search_by_vector(embedding_vector)
docs_and_scores

[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'data/state_of_the_union.txt'}),
 Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of publ