{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Summarization on Custom Dataset with SageMaker Jumpstart and [LangChain](https://python.langchain.com/en/latest/index.html) Library\n",
    "\n",
    "Reference: https://github.com/gkamradt/langchain-tutorials/tree/main/data_generation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    " There are two main types of methods for summarizing text: abstractive and extractive.\n",
    "\n",
    "Abstractive summarization generates a new shorter summary in its own words based on understanding the meaning and concepts of the original text. It analyzes the text using advanced natural language techniques to grasp the key ideas and then expresses those ideas in a summarized form using different words and phrases. This is similar to how humans summarize by reading something and then explaining the main points in their own words.\n",
    "\n",
    "Extractive summarization works by selecting the most important sentences, phrases or words from the original text to construct a summary. It calculates the weight or importance of each part of the text using algorithms and then chooses the parts with the highest weights to put into the summary. This pulls summarizes by extracting key elements from the text itself rather than interpreting the meaning.\n",
    "\n",
    "So in short, abstractive summarization rewrites the key ideas in new words while extractive summarization selects the most salient parts of the existing text. Both aim to distill the essence and most significant information from the original document into a condensed summary.\n",
    "\n",
    "We're going to run through 3 methods for summarization that start with basic prompting to summarizing large documents using `map_reduce` method. These aren't the only options, feel free to modify it based on your use case. \n",
    "\n",
    "**3 Levels Of Summarization:**\n",
    "1. **Summarize a couple sentences** - Basic Prompt\n",
    "2. **Summarize a couple paragraphs** - Prompt Templates\n",
    "3. **Summarize a large document with multiple pages** - Map Reduce\n",
    "\n",
    "In this notebook we will demonstrate how to use **Flan T5 XXL**, and **Flan T5 UL2** for text summarization using a library of documents as a reference.\n",
    "\n",
    "**This notebook serves a template such that you can easily replace the example dataset by your own to build a custom text summarization application.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy large language model (LLM) and embedding model in SageMaker JumpStart\n",
    "\n",
    "To better illustrate the idea, let's first deploy all the models that are required to perform the demo. You can choose either deploying all three Flan T5 XL, BloomZ 7B1, and Flan UL2 models as the large language model (LLM) to compare their model performances, or select **subset** of the models based on your preference. To do that, you need modify the `_MODEL_CONFIG_` python dictionary defined as below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    },
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[33mWARNING: jsonschema 3.2.0 does not provide the extra 'format-nongpl'\u001b[0m\u001b[33m\n",
      "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "jupyterlab 3.2.1 requires jupyter-server~=1.4, but you have jupyter-server 2.5.0 which is incompatible.\n",
      "jupyterlab-server 2.8.2 requires jupyter-server~=1.4, but you have jupyter-server 2.5.0 which is incompatible.\n",
      "docker-compose 1.29.2 requires PyYAML<6,>=3.10, but you have pyyaml 6.0 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
      "daal4py 2021.3.0 requires daal==2021.2.3, which is not installed.\n",
      "scipy 1.7.1 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.23.5 which is incompatible.\n",
      "numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.23.5 which is incompatible.\u001b[0m\u001b[31m\n",
      "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install --upgrade pip\n",
    "!pip install --upgrade sagemaker --quiet\n",
    "!pip install ipywidgets==7.0.0 --quiet\n",
    "!pip install langchain==0.0.148 --quiet\n",
    "!pip install faiss-cpu --quiet\n",
    "!pip install pytesseract --quiet\n",
    "!pip install unstructured --quiet\n",
    "!pip install transformers --quiet\n",
    "!pip install datasets --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
      "\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "!pip install datasets --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.8/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5\n",
      "  warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "import sagemaker, boto3, json\n",
    "from sagemaker.session import Session\n",
    "from sagemaker.model import Model\n",
    "from sagemaker import image_uris, model_uris, script_uris, hyperparameters\n",
    "from sagemaker.predictor import Predictor\n",
    "from sagemaker.utils import name_from_base\n",
    "from typing import Any, Dict, List, Optional\n",
    "from langchain.embeddings import SagemakerEndpointEmbeddings\n",
    "from langchain.llms.sagemaker_endpoint import ContentHandlerBase\n",
    "from langchain import PromptTemplate\n",
    "\n",
    "sagemaker_session = Session()\n",
    "aws_role = sagemaker_session.get_caller_identity_arn()\n",
    "aws_region = boto3.Session().region_name\n",
    "sess = sagemaker.Session()\n",
    "model_version = \"*\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def query_endpoint_with_json_payload(encoded_json, endpoint_name, content_type=\"application/json\"):\n",
    "    client = boto3.client(\"runtime.sagemaker\")\n",
    "    response = client.invoke_endpoint(\n",
    "        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json\n",
    "    )\n",
    "    return response\n",
    "\n",
    "\n",
    "def parse_response_model_flan_t5(query_response):\n",
    "    model_predictions = json.loads(query_response[\"Body\"].read())\n",
    "    generated_text = model_predictions[\"generated_texts\"]\n",
    "    return generated_text\n",
    "\n",
    "\n",
    "def parse_response_multiple_texts_bloomz(query_response):\n",
    "    generated_text = []\n",
    "    model_predictions = json.loads(query_response[\"Body\"].read())\n",
    "    for x in model_predictions[0]:\n",
    "        generated_text.append(x[\"generated_text\"])\n",
    "    return generated_text"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Deploy SageMaker endpoint(s) for large language models. Please uncomment the entries as below if you want to deploy multiple LLM models to compare their performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "_MODEL_CONFIG_ = {\n",
    "    \"huggingface-text2text-flan-t5-xl\": {\n",
    "        \"instance type\": \"ml.p3.2xlarge\",\n",
    "        \"env\": {\"TS_DEFAULT_WORKERS_PER_MODEL\": \"1\"},\n",
    "        \"parse_function\": parse_response_model_flan_t5,\n",
    "        \"prompt\": \"\"\"Write a brief summary based on the:\\n\\n{context}\"\"\",\n",
    "    },\n",
    "    # \"huggingface-textgeneration1-bloomz-7b1-fp16\": {\n",
    "    #     \"instance type\": \"ml.g5.12xlarge\",\n",
    "    #     \"env\": {},\n",
    "    #     \"parse_function\": parse_response_multiple_texts_bloomz,\n",
    "    #     \"prompt\": \"\"\"question: \\\"{question}\"\\\\n\\nContext: \\\"{context}\"\\\\n\\nAnswer:\"\"\",\n",
    "    # },\n",
    "    # \"huggingface-text2text-flan-ul2-bf16\": {\n",
    "    #     \"instance type\": \"ml.g5.24xlarge\",\n",
    "    #     \"env\": {\"TS_DEFAULT_WORKERS_PER_MODEL\": \"1\"},\n",
    "    #     \"parse_function\": parse_response_model_flan_t5,\n",
    "    #     \"prompt\": \"\"\"Answer based on context:\\n\\n{context}\\n\\n{question}\"\"\",\n",
    "    # },\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------!\u001b[1mModel huggingface-text2text-flan-t5-xl has been deployed successfully.\u001b[0m\n",
      "\n"
     ]
    }
   ],
   "source": [
    "newline, bold, unbold = \"\\n\", \"\\033[1m\", \"\\033[0m\"\n",
    "\n",
    "for model_id in _MODEL_CONFIG_:\n",
    "    endpoint_name = name_from_base(f\"fm-text-summary-{model_id}\")\n",
    "    inference_instance_type = _MODEL_CONFIG_[model_id][\"instance type\"]\n",
    "\n",
    "    # Retrieve the inference container uri. This is the base HuggingFace container image for the default model above.\n",
    "    deploy_image_uri = image_uris.retrieve(\n",
    "        region=None,\n",
    "        framework=None,  # automatically inferred from model_id\n",
    "        image_scope=\"inference\",\n",
    "        model_id=model_id,\n",
    "        model_version=model_version,\n",
    "        instance_type=inference_instance_type,\n",
    "    )\n",
    "    # Retrieve the model uri.\n",
    "    model_uri = model_uris.retrieve(\n",
    "        model_id=model_id, model_version=model_version, model_scope=\"inference\"\n",
    "    )\n",
    "    model_inference = Model(\n",
    "        image_uri=deploy_image_uri,\n",
    "        model_data=model_uri,\n",
    "        role=aws_role,\n",
    "        predictor_cls=Predictor,\n",
    "        name=endpoint_name,\n",
    "        env=_MODEL_CONFIG_[model_id][\"env\"],\n",
    "    )\n",
    "    model_predictor_inference = model_inference.deploy(\n",
    "        initial_instance_count=1,\n",
    "        instance_type=inference_instance_type,\n",
    "        predictor_cls=Predictor,\n",
    "        endpoint_name=endpoint_name,\n",
    "    )\n",
    "    print(f\"{bold}Model {model_id} has been deployed successfully.{unbold}{newline}\")\n",
    "    _MODEL_CONFIG_[model_id][\"endpoint_name\"] = endpoint_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summarize couple of sentences "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt = \"\"\"\n",
    "Please provide a summary of the following text\n",
    "\n",
    "TEXT:\n",
    "Philosophy (from Greek: φιλοσοφία, philosophia, 'love of wisdom') \\\n",
    "is the systematized study of general and fundamental questions, \\\n",
    "such as those about existence, reason, knowledge, values, mind, and language. \\\n",
    "Some sources claim the term was coined by Pythagoras (c. 570 – c. 495 BCE), \\\n",
    "although this theory is disputed by some. Philosophical methods include questioning, \\\n",
    "critical discussion, rational argument, and systematic presentation.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we wrap up our SageMaker endpoints for LLM into `langchain.llms.sagemaker_endpoint.SagemakerEndpoint`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint\n",
    "\n",
    "parameters = {\n",
    "    \"max_length\": 200,\n",
    "    \"num_return_sequences\": 1,\n",
    "    \"top_k\": 250,\n",
    "    \"top_p\": 0.95,\n",
    "    \"do_sample\": False,\n",
    "    \"temperature\": 1,\n",
    "}\n",
    "\n",
    "class ContentHandler(LLMContentHandler):\n",
    "    content_type = \"application/json\"\n",
    "    accepts = \"application/json\"\n",
    "\n",
    "    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:\n",
    "        input_str = json.dumps({\"text_inputs\": prompt, **model_kwargs})\n",
    "        return input_str.encode(\"utf-8\")\n",
    "\n",
    "    def transform_output(self, output: bytes) -> str:\n",
    "        response_json = json.loads(output.read().decode(\"utf-8\"))\n",
    "        return response_json[\"generated_texts\"][0]\n",
    "\n",
    "\n",
    "content_handler = ContentHandler()\n",
    "\n",
    "sm_llm = SagemakerEndpoint(\n",
    "    endpoint_name=_MODEL_CONFIG_[\"huggingface-text2text-flan-t5-xl\"][\"endpoint_name\"],\n",
    "    region_name=aws_region,\n",
    "    model_kwargs=parameters,\n",
    "    content_handler=content_handler,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Our prompt has 121 tokens\n"
     ]
    }
   ],
   "source": [
    "num_tokens = sm_llm.get_num_tokens(prompt)\n",
    "print (f\"Our prompt has {num_tokens} tokens\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Philosophy (from Greek:, philosophia, 'love of wisdom') is the systematized study of general and fundamental questions, such as those about existence, reason, knowledge, values, mind, and language.\n"
     ]
    }
   ],
   "source": [
    "output = sm_llm(prompt)\n",
    "print (output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "prompt = \"\"\"\n",
    "summary: Write a ~ 50 word summary of the following text:\n",
    "\n",
    "TEXT:\n",
    "Philosophy (from Greek: φιλοσοφία, philosophia, 'love of wisdom') \\\n",
    "is the systematized study of general and fundamental questions, \\\n",
    "such as those about existence, reason, knowledge, values, mind, and language. \\\n",
    "Some sources claim the term was coined by Pythagoras (c. 570 – c. 495 BCE), \\\n",
    "although this theory is disputed by some. Philosophical methods include questioning, \\\n",
    "critical discussion, rational argument, and systematic presentation.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Philosophy is the systematic study of general and fundamental questions, such as those about existence, reason, knowledge, values, mind, and language.\n"
     ]
    }
   ],
   "source": [
    "output = sm_llm(prompt)\n",
    "print (output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Summarize a couple paragraphs -  Prompt Templates\n",
    "\n",
    "Prompt templates are a great way to dynamically place text within your prompts. They are like [python f-strings](https://realpython.com/python-f-strings/) but specialized for working with language models.\n",
    "\n",
    "We're going to look at 2 short Paul Graham essays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "paul_graham_essays = ['data/PaulGrahamEssaySmall/getideas.txt', 'data/PaulGrahamEssaySmall/noob.txt']\n",
    "\n",
    "essays = []\n",
    "\n",
    "for file_name in paul_graham_essays:\n",
    "    with open(file_name, 'r') as file:\n",
    "        essays.append(file.read())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Essay #1: January 2023(Someone fed my essays into GPT to make something that could answer\n",
      "questions based on them, then asked it where good ideas come from.  The\n",
      "answer was ok, but not what I would have said. This is what I would have said.)The way to get new ideas is to notice anomalies: what seems strange,\n",
      "\n",
      "\n",
      "Essay #2: January 2020When I was young, I thought old people had everything figured out.\n",
      "Now that I'm old, I know this isn't true.I constantly feel like a noob. It seems like I'm always talking to\n",
      "some startup working in a new field I know nothing about, or reading\n",
      "a book about a topic I don't understand well\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i, essay in enumerate(essays):\n",
    "    print (f\"Essay #{i+1}: {essay[:300]}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next let's create a prompt template which will hold our instructions and a placeholder for the essay. In this example we only want a 1 sentence summary to come back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "template = \"\"\"\n",
    "ten: Summarize the following article in 10-20 words:\n",
    "\n",
    "{essay}\n",
    "\"\"\"\n",
    "\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"essay\"],\n",
    "    template=template\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This prompt + essay has 208 tokens\n",
      "Summary: The way to get new ideas is to notice anomalies: what seems strange, or missing, or broken? You can see anomalies in everyday life (much of standup comedy is based on this), but the best place to look for them is at the frontiers of knowledge.\n",
      "\n",
      "\n",
      "This prompt + essay has 503 tokens\n",
      "Summary: The more you feel like a noob, the better.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for essay in essays:\n",
    "    summary_prompt = prompt.format(essay=essay)\n",
    "    \n",
    "    num_tokens = sm_llm.get_num_tokens(summary_prompt)\n",
    "    print (f\"This prompt + essay has {num_tokens} tokens\")\n",
    "    \n",
    "    summary = sm_llm(summary_prompt)\n",
    "    \n",
    "    print (f\"Summary: {summary.strip()}\")\n",
    "    print (\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summarize a couple pages multiple pages - MapReduce\n",
    "\n",
    "If you have multiple pages you'd like to summarize, you'll likely run into a token limit. Token limits won't always be a problem, but it is good to know how to handle them if you run into the issue.\n",
    "\n",
    "The chain type \"Map Reduce\" is a method that helps with this. You first generate a summary of smaller chunks (that fit within the token limit) and then you get a summary of the summaries.\n",
    "\n",
    "Check out [this video](https://www.youtube.com/watch?v=f9_BWhCI4Zo) for more information on how chain types work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain.chains.summarize import load_summarize_chain\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "paul_graham_essay = 'data/PaulGrahamEssays/startupideas.txt'\n",
    "\n",
    "with open(paul_graham_essay, 'r') as file:\n",
    "    essay = file.read()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Token indices sequence length is longer than the specified maximum sequence length for this model (9568 > 1024). Running this sequence through the model will result in indexing errors\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "9568"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sm_llm.get_num_tokens(essay)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's too many, let's split our text up into chunks so they fit into the prompt limit. I'm going a chunk size of 10,000 characters. \n",
    "\n",
    "> You can think of tokens as pieces of words used for natural language processing. For English text, **1 token is approximately 4 characters** or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.\n",
    "\n",
    "This means the number of tokens we should expect is 10,000 / 4 = ~2,500 token chunks. But this will vary, each body of text/code will be different"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "text_splitter = RecursiveCharacterTextSplitter(separators=[\"\\n\\n\", \"\\n\"], chunk_size=2000, chunk_overlap=500)\n",
    "\n",
    "docs = text_splitter.create_documents([essay])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Now we have 30 documents and the first one has 16 tokens\n"
     ]
    }
   ],
   "source": [
    "num_docs = len(docs)\n",
    "\n",
    "num_tokens_first_doc = sm_llm.get_num_tokens(docs[0].page_content)\n",
    "\n",
    "print (f\"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great, assuming that number of tokens is consistent in the other docs we should be good to go. Let's use LangChain's [load_summarize_chain](https://python.langchain.com/en/latest/use_cases/summarization.html) method, we will use `refine` chain type for summarization. We first need to initialize our chain"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "summary_chain = load_summarize_chain(llm=sm_llm, chain_type='map_reduce',\n",
    "                                     verbose=True # Set verbose=True if you want to see the prompts being used\n",
    "                                    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\u001b[1m> Entering new MapReduceDocumentsChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"Want to start a startup?  Get funded by\n",
      "Y Combinator.\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"November 2012The way to get startup ideas is not to try to think of startup\n",
      "ideas.  It's to look for problems, preferably problems you have\n",
      "yourself.The very best startup ideas tend to have three things in common:\n",
      "they're something the founders themselves want, that they themselves\n",
      "can build, and that few others realize are worth doing.  Microsoft,\n",
      "Apple, Yahoo, Google, and Facebook all began this way.\n",
      "ProblemsWhy is it so important to work on a problem you have?  Among other\n",
      "things, it ensures the problem really exists.  It sounds obvious\n",
      "to say you should only work on problems that exist.  And yet by far\n",
      "the most common mistake startups make is to solve problems no one\n",
      "has.I made it myself.  In 1995 I started a company to put art galleries\n",
      "online.  But galleries didn't want to be online.  It's not how the\n",
      "art business works.  So why did I spend 6 months working on this\n",
      "stupid idea?  Because I didn't pay attention to users.  I invented\n",
      "a model of the world that didn't correspond to reality, and worked\n",
      "from that.  I didn't notice my model was wrong until I tried\n",
      "to convince users to pay for what we'd built.  Even then I took\n",
      "embarrassingly long to catch on.  I was attached to my model of the\n",
      "world, and I'd spent a lot of time on the software.  They had to\n",
      "want it!Why do so many founders build things no one wants?  Because they\n",
      "begin by trying to think of startup ideas.  That m.o. is doubly\n",
      "dangerous: it doesn't merely yield few good ideas; it yields bad\n",
      "ideas that sound plausible enough to fool you into working on them.At YC we call these \"made-up\" or \"sitcom\" startup ideas.  Imagine\n",
      "one of the characters on a TV show was starting a startup.  The\n",
      "writers would have to invent something for it to do.  But coming\n",
      "up with good startup ideas is hard.  It's not something you can do\n",
      "for the asking.  So (unless they got amazingly lucky) the writers\n",
      "would come up with an idea that sounded plausible, but was actually\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"ideas that sound plausible enough to fool you into working on them.At YC we call these \"made-up\" or \"sitcom\" startup ideas.  Imagine\n",
      "one of the characters on a TV show was starting a startup.  The\n",
      "writers would have to invent something for it to do.  But coming\n",
      "up with good startup ideas is hard.  It's not something you can do\n",
      "for the asking.  So (unless they got amazingly lucky) the writers\n",
      "would come up with an idea that sounded plausible, but was actually\n",
      "bad.For example, a social network for pet owners.  It doesn't sound\n",
      "obviously mistaken.  Millions of people have pets.  Often they care\n",
      "a lot about their pets and spend a lot of money on them.  Surely\n",
      "many of these people would like a site where they could talk to\n",
      "other pet owners.  Not all of them perhaps, but if just 2 or 3\n",
      "percent were regular visitors, you could have millions of users.\n",
      "You could serve them targeted offers, and maybe charge for premium\n",
      "features. \n",
      "[1]The danger of an idea like this is that when you run it by your\n",
      "friends with pets, they don't say \"I would never use this.\" They\n",
      "say \"Yeah, maybe I could see using something like that.\" Even when\n",
      "the startup launches, it will sound plausible to a lot of people.\n",
      "They don't want to use it themselves, at least not right now, but\n",
      "they could imagine other people wanting it.  Sum that reaction\n",
      "across the entire population, and you have zero users. \n",
      "[2]\n",
      "WellWhen a startup launches, there have to be at least some users who\n",
      "really need what they're making — not just people who could see\n",
      "themselves using it one day, but who want it urgently.  Usually\n",
      "this initial group of users is small, for the simple reason that\n",
      "if there were something that large numbers of people urgently needed\n",
      "and that could be built with the amount of effort a startup usually\n",
      "puts into a version one, it would probably already exist.  Which\n",
      "means you have to compromise on one dimension: you can either build\n",
      "something a large number of people want a small amount, or something\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"themselves using it one day, but who want it urgently.  Usually\n",
      "this initial group of users is small, for the simple reason that\n",
      "if there were something that large numbers of people urgently needed\n",
      "and that could be built with the amount of effort a startup usually\n",
      "puts into a version one, it would probably already exist.  Which\n",
      "means you have to compromise on one dimension: you can either build\n",
      "something a large number of people want a small amount, or something\n",
      "a small number of people want a large amount.  Choose the latter.\n",
      "Not all ideas of that type are good startup ideas, but nearly all\n",
      "good startup ideas are of that type.Imagine a graph whose x axis represents all the people who might\n",
      "want what you're making and whose y axis represents how much they\n",
      "want it.  If you invert the scale on the y axis, you can envision\n",
      "companies as holes.  Google is an immense crater: hundreds of\n",
      "millions of people use it, and they need it a lot.  A startup just\n",
      "starting out can't expect to excavate that much volume.  So you\n",
      "have two choices about the shape of hole you start with.  You can\n",
      "either dig a hole that's broad but shallow, or one that's narrow\n",
      "and deep, like a well.Made-up startup ideas are usually of the first type.  Lots of people\n",
      "are mildly interested in a social network for pet owners.Nearly all good startup ideas are of the second type.  Microsoft\n",
      "was a well when they made Altair Basic.  There were only a couple\n",
      "thousand Altair owners, but without this software they were programming\n",
      "in machine language.  Thirty years later Facebook had the same\n",
      "shape.  Their first site was exclusively for Harvard students, of\n",
      "which there are only a few thousand, but those few thousand users\n",
      "wanted it a lot.When you have an idea for a startup, ask yourself: who wants this\n",
      "right now?  Who wants this so much that they'll use it even when\n",
      "it's a crappy version one made by a two-person startup they've never\n",
      "heard of?  If you can't answer that, the idea is probably bad.\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"in machine language.  Thirty years later Facebook had the same\n",
      "shape.  Their first site was exclusively for Harvard students, of\n",
      "which there are only a few thousand, but those few thousand users\n",
      "wanted it a lot.When you have an idea for a startup, ask yourself: who wants this\n",
      "right now?  Who wants this so much that they'll use it even when\n",
      "it's a crappy version one made by a two-person startup they've never\n",
      "heard of?  If you can't answer that, the idea is probably bad. \n",
      "[3]You don't need the narrowness of the well per se.  It's depth you\n",
      "need; you get narrowness as a byproduct of optimizing for depth\n",
      "(and speed).  But you almost always do get it.  In practice the\n",
      "link between depth and narrowness is so strong that it's a good\n",
      "sign when you know that an idea will appeal strongly to a specific\n",
      "group or type of user.But while demand shaped like a well is almost a necessary condition\n",
      "for a good startup idea, it's not a sufficient one.  If Mark\n",
      "Zuckerberg had built something that could only ever have appealed\n",
      "to Harvard students, it would not have been a good startup idea.\n",
      "Facebook was a good idea because it started with a small market\n",
      "there was a fast path out of.  Colleges are similar enough that if\n",
      "you build a facebook that works at Harvard, it will work at any\n",
      "college. So you spread rapidly through all the colleges.  Once you\n",
      "have all the college students, you get everyone else simply by\n",
      "letting them in.Similarly for Microsoft: Basic for the Altair; Basic for other\n",
      "machines; other languages besides Basic; operating systems;\n",
      "applications; IPO.\n",
      "SelfHow do you tell whether there's a path out of an idea?  How do you\n",
      "tell whether something is the germ of a giant company, or just a\n",
      "niche product?  Often you can't. The founders of Airbnb didn't\n",
      "realize at first how big a market they were tapping.  Initially\n",
      "they had a much narrower idea.  They were going to let hosts rent\n",
      "out space on their floors during conventions.  They didn't foresee\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new StuffDocumentsChain chain...\u001b[0m\n",
      "\n",
      "\n",
      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
      "Prompt after formatting:\n",
      "\u001b[32;1m\u001b[1;3mWrite a concise summary of the following:\n",
      "\n",
      "\n",
      "\"Y Combinator is a startup accelerator that has a reputation for being a tough place to get into.\n",
      "\n",
      "The way to get startup ideas is not to try to think of startup ideas. It's to look for problems, preferably problems you have yourself. The very best startup ideas tend to have three things in common: they're something the founders themselves want, that they themselves can build, and that few others realize are worth doing. Microsoft, Apple, Yahoo, Google, and Facebook all began this way. ProblemsWhy is it so important to work on a problem you have? Among other things, it ensures the problem really exists. And yet by far the most common mistake startups make is to solve problems no one has. I made it myself. In 1995 I started a company to put art galleries online. But galleries didn't want to be online. It's not how the art business works. So why did I spend 6 months working on this stupid idea? Because I didn't pay attention to users. I invented a model of the\n",
      "\n",
      "\"It's a good idea to have a startup that sounds plausible enough to fool you into working on it.\n",
      "\n",
      "A startup is a company that builds something that people want.\n",
      "\n",
      "The \"well\" shape of demand is a necessary condition for a good startup idea. But it's not a sufficient one.\"\n",
      "\n",
      "\n",
      "CONCISE SUMMARY:\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "output = summary_chain.run(docs[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\"The way to get startup ideas is not to try to think of startup ideas. It\\'s to look for problems, preferably problems you have yourself. The very best startup ideas tend to have three things in common: they\\'re something the founders themselves want, that they themselves can build, and that few others realize are worth doing. Microsoft, Apple, Yahoo, Google, and Facebook all began this way. ProblemsWhy is it so important to work on a problem you have? Among other things, it ensures the problem really exists. And yet by far the most common mistake startups make is to solve problems no one has. I made it myself. In 1995 I started a company to put art galleries online. But galleries didn\\'t want to be online. It\\'s not how the art business works. So why did I spend 6 months working on this stupid idea? Because I didn\\'t pay attention to users. I invented a model of'"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This summary is a great start, but lets modify to get only the key points in the summary.\n",
    "\n",
    "In order to do this we will use custom promopts (like we did above) to instruct the model on what we need. Please note that the prompts format that is used in the notebook is based on flan t5, taken from this [source.](https://huggingface.co/jordiclive/flan-t5-11b-summarizer-filtered?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)\n",
    "\n",
    "The map_prompt is going to stay the same (just showing it for clarity), but I'll edit the combine_prompt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "map_prompt = \"\"\"\n",
    "summary: Write a ~ 500 word summary of the following text:\n",
    "\"{text}\"\n",
    "\"\"\"\n",
    "map_prompt_template = PromptTemplate(template=map_prompt, input_variables=[\"text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "combine_prompt = \"\"\"\n",
    "Cover only  the key points of the text.\n",
    "```{text}```\n",
    "\"\"\"\n",
    "combine_prompt_template = PromptTemplate(template=combine_prompt, input_variables=[\"text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "summary_chain_key_points = load_summarize_chain(llm=sm_llm,\n",
    "                                     chain_type='map_reduce',\n",
    "                                     map_prompt=map_prompt_template,\n",
    "                                     combine_prompt=combine_prompt_template,\n",
    "                                     # verbose=True\n",
    "                                    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of summarizing all the 30 split documents (chunks), I am using only 15 of them to save time  as it can take few minutes and does not run out of memory on the notebook instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "output_key_points = summary_chain_key_points.run(docs[:15])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y Combinator, the accelerator that helped launch the likes of Twitter and Facebook, has announced that it will fund a new batch of startups. The way to get startup ideas is not to try to think of startup ideas. It's to look for problems, preferably problems you have yourself. The very best startup ideas tend to have three things in common: they're something the founders themselves want, that they themselves can build, and that few others realize are worth doing. Microsoft, Apple, Yahoo, Google, and Facebook all began this way. ProblemsWhy is it so important to work on a problem you have? Among other things, it ensures the problem really exists. And yet by far the most common mistake startups make is to solve problems no one has. I made it myself. In 1995 I started a company to put art galleries online. But galleries didn't want to be online. It's not how the art business works\n"
     ]
    }
   ],
   "source": [
    "print(output_key_points)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean Up\n",
    "*NOTE:* Please make sure to delete the endpoint, if you are not using it, as it will incur charges. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'ResponseMetadata': {'RequestId': '259cf7f3-bedc-4e6b-bfe4-f89345bb5bb5',\n",
       "  'HTTPStatusCode': 200,\n",
       "  'HTTPHeaders': {'x-amzn-requestid': '259cf7f3-bedc-4e6b-bfe4-f89345bb5bb5',\n",
       "   'content-type': 'application/x-amz-json-1.1',\n",
       "   'content-length': '0',\n",
       "   'date': 'Fri, 19 May 2023 16:34:03 GMT'},\n",
       "  'RetryAttempts': 0}}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Specify the name of your endpoint\n",
    "endpoint_name_llm=_MODEL_CONFIG_[\"huggingface-text2text-flan-t5-xxl\"][\"endpoint_name\"]\n",
    "\n",
    "# # Create a low-level SageMaker service client.\n",
    "sagemaker_client = boto3.client('sagemaker', region_name=aws_region)\n",
    "                        \n",
    "# # Delete endpoint configuration\n",
    "sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_name_llm)\n",
    "\n",
    "# Delete endpoint\n",
    "sagemaker_client.delete_endpoint(EndpointName=endpoint_name_llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science 2.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-38"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}