{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ad61b266-703b-48f5-abf1-cf6a4b315ff1",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 🧑‍🏫 Large Language Models for Education 🧑‍🏫\n",
    "\n",
    "\n",
    "---\n",
    "\n",
    "### 🧑‍🎓 A note on Generative AI in Education\n",
    "\n",
    "By harnessing generative AI, educators can unlock new and captivating products, enabling them to craft engaging and interactive learning experiences that promote student growth. Experts envision a future where generative AI empowers educators to revolutionize the way knowledge is imparted, paving the way for transformative educational practices.\n",
    "\n",
    "---\n",
    "\n",
    "In this notebook, we demonstrate how to use large language models (LLMs) for use cases in education.  LLMs can be used for tasks such as summarization, question-answering, or the generation of question & answer pairs.\n",
    "\n",
    "Text Summarization is the task of shortening the data and creating a summary that represents the most important information present in the original text. Here, we show how to use state-of-the-art pre-trained model **FLAN T5** for text summarization, as well all the other tasks. \n",
    "\n",
    "In the first part of the notebook, we select and deploy the **FLAN T5** model as a SageMaker Real-time endpoint, on a single `ml.p3.2xlarge` instance. SageMaker Real-time endpoints is ideal for inference workloads where you have real-time, interactive, low latency requirements.  These endpoints are fully managed, automatically serve your models through HTTP, and support auto-scaling.\n",
    "\n",
    "Once the model is deployed and ready to use, we demonstrate how it can be queried, how to prompt the model for summarization, question-answering and the generation of question & answer pairs.\n",
    "\n",
    "The final section is split into four demos on querying the Wikipedia article on quantum computing, an ebook text from [Project Gutenberg](https://www.gutenberg.org/), a scientific pdf article from arxiv.org, and the Australian Budget 2023-24 Medicare Overview."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "882bdb91-fa4a-43d1-90e4-52a319b57743",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 1.Setting up the SageMaker Endpoint\n",
    "\n",
    "### 1.1 Install Python Dependencies and SageMaker setup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5805af2-10cd-4ad3-af9d-0b863b323f54",
   "metadata": {
    "tags": []
   },
   "source": [
    "Before executing the notebook, there are some initial steps required for set up. This notebook requires latest version of sagemaker and other libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c62662a0-fa82-475c-b671-2ebeb67467ae",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install --upgrade pip\n",
    "!pip install -U sagemaker\n",
    "!pip install -U langchain\n",
    "!pip install -U PyPDF2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4afab6b-5f54-4ccd-b173-3670dc2ecd99",
   "metadata": {},
   "source": [
    "We Load SDK and helper scripts. First, we import required packages and load the S3 bucket from SageMaker session, as shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "011c244c-ca72-40f2-b7c9-7a94f25becf1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import sagemaker, boto3, json, logging\n",
    "from sagemaker import image_uris, instance_types, model_uris, script_uris\n",
    "from sagemaker.model import Model\n",
    "from sagemaker.predictor import Predictor\n",
    "from sagemaker.session import Session\n",
    "from sagemaker.utils import name_from_base\n",
    "from IPython.display import display, HTML, IFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "13b46906-f59b-439e-b10a-cc65725177dd",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "logger = logging.getLogger('sagemaker')\n",
    "logger.setLevel(logging.DEBUG)\n",
    "logger.addHandler(logging.StreamHandler())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "20233a5f-08ce-4d04-bae4-379c32e9f0af",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using sagemaker==2.168.0\n",
      "Using boto3==1.26.154\n"
     ]
    }
   ],
   "source": [
    "logger.info(f'Using sagemaker=={sagemaker.__version__}')\n",
    "logger.info(f'Using boto3=={boto3.__version__}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ae27a1cd-6507-4a12-acfa-a4d2cf6289e3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create the folder where the model weights will be stored\n",
    "!mkdir -p download_dir\n",
    "!mkdir -p source_documents_dir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "747d0e08-797c-4e34-8db0-274d9ab58d89",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_sagemaker_session(local_download_dir) -> sagemaker.Session:\n",
    "    \"\"\"Return the SageMaker session.\"\"\"\n",
    "\n",
    "    sagemaker_client = boto3.client(\n",
    "        service_name=\"sagemaker\", region_name=boto3.Session().region_name\n",
    "    )\n",
    "\n",
    "    session_settings = sagemaker.session_settings.SessionSettings(\n",
    "        local_download_dir=local_download_dir\n",
    "    )\n",
    "\n",
    "    # the unit test will ensure you do not commit this change\n",
    "    session = sagemaker.session.Session(\n",
    "        sagemaker_client=sagemaker_client, settings=session_settings\n",
    "    )\n",
    "\n",
    "    return session"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e356a656-ae9e-44fc-857c-1f717012514e",
   "metadata": {},
   "source": [
    "### 1.2 Deploying a SageMaker Endpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be5a8649-2aac-4ab5-b4b1-c3af59893dde",
   "metadata": {},
   "source": [
    "Using SageMaker, we can perform inference on the pre-trained model, even without fine-tuning it first on a new dataset. We start by retrieving the `deploy_image_uri`, `deploy_source_uri`, and `model_uri` for the pre-trained model. To host the pre-trained model, we create an instance of [`sagemaker.model.Model`](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) and deploy it. This may take a few minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "84a5e29d-29af-4258-b107-a9d5a545e0b1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using role arn:aws:iam::543088942053:role/service-role/AmazonSageMaker-ExecutionRole-20220709T155176 in region us-east-1\n"
     ]
    }
   ],
   "source": [
    "sagemaker_session = Session()\n",
    "aws_role = sagemaker_session.get_caller_identity_arn()\n",
    "aws_region = boto3.Session().region_name\n",
    "sess = sagemaker.Session()\n",
    "\n",
    "# We select the Flan-T5 XL model available in the Hugging Face container.\n",
    "model_id, model_version = \"huggingface-text2text-flan-t5-xl\", \"*\"\n",
    "_model_env_variable_map = {\n",
    "    \"huggingface-text2text-flan-t5-xl\": {\"MMS_DEFAULT_WORKERS_PER_MODEL\": \"1\"},\n",
    "}\n",
    "\n",
    "endpoint_name = name_from_base(f\"jumpstart-example-{model_id}\")\n",
    "instance_type = 'ml.p3.2xlarge'\n",
    "logger.info(f'Using role {aws_role} in region {aws_region}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "102add5b-e0f7-431f-8c36-bcb7d11a39b6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.\n",
    "deploy_image_uri = image_uris.retrieve(\n",
    "    region=None,\n",
    "    framework=None,  # automatically inferred from model_id\n",
    "    image_scope=\"inference\",\n",
    "    model_id=model_id,\n",
    "    model_version=model_version,\n",
    "    instance_type=instance_type,\n",
    ")\n",
    "\n",
    "# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.\n",
    "deploy_source_uri = script_uris.retrieve(\n",
    "    model_id=model_id, model_version=model_version, script_scope=\"inference\"\n",
    ")\n",
    "\n",
    "# Retrieve the model uri.\n",
    "model_uri = model_uris.retrieve(\n",
    "    model_id=model_id, model_version=model_version, model_scope=\"inference\"\n",
    ")\n",
    "\n",
    "# Create the SageMaker model instance\n",
    "if model_id in _model_env_variable_map:\n",
    "    # For those large models, we already repack the inference script and model\n",
    "    # artifacts for you, so the `source_dir` argument to Model is not required.\n",
    "    model = Model(\n",
    "        image_uri=deploy_image_uri,\n",
    "        model_data=model_uri,\n",
    "        role=aws_role,\n",
    "        predictor_cls=Predictor,\n",
    "        name=endpoint_name,\n",
    "        env=_model_env_variable_map[model_id],\n",
    "    )\n",
    "else:\n",
    "    model = Model(\n",
    "        image_uri=deploy_image_uri,\n",
    "        source_dir=deploy_source_uri,\n",
    "        model_data=model_uri,\n",
    "        entry_point=\"inference.py\",  # entry point file in source_dir and present in deploy_source_uri\n",
    "        role=aws_role,\n",
    "        predictor_cls=Predictor,\n",
    "        name=endpoint_name,\n",
    "        sagemaker_session=get_sagemaker_session(\"download_dir\"),\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "26efc2da-eeab-406e-8a8a-b730e1c75e96",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Creating model with name: jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762\n",
      "CreateModel request: {\n",
      "    \"ModelName\": \"jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762\",\n",
      "    \"ExecutionRoleArn\": \"arn:aws:iam::543088942053:role/service-role/AmazonSageMaker-ExecutionRole-20220709T155176\",\n",
      "    \"PrimaryContainer\": {\n",
      "        \"Image\": \"763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04\",\n",
      "        \"Environment\": {\n",
      "            \"MMS_DEFAULT_WORKERS_PER_MODEL\": \"1\"\n",
      "        },\n",
      "        \"ModelDataUrl\": \"s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz\"\n",
      "    },\n",
      "    \"Tags\": [\n",
      "        {\n",
      "            \"Key\": \"aws-jumpstart-inference-model-uri\",\n",
      "            \"Value\": \"s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.1.2/infer-prepack-huggingface-text2text-flan-t5-xl.tar.gz\"\n",
      "        }\n",
      "    ]\n",
      "}\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Deploying endpoint jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762 on 1 x ml.p3.2xlarge (this will take approximately 6-8 minutes)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Creating endpoint-config with name jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762\n",
      "Creating endpoint with name jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "------------!CPU times: user 171 ms, sys: 16.4 ms, total: 187 ms\n",
      "Wall time: 6min 33s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,\n",
    "# for being able to run inference through the sagemaker API.\n",
    "print(f'Deploying endpoint {endpoint_name} on 1 x {instance_type} (this will take approximately 6-8 minutes)')\n",
    "try:\n",
    "    model_predictor = model.deploy(\n",
    "        initial_instance_count=1,\n",
    "        instance_type=instance_type,\n",
    "        predictor_cls=Predictor,\n",
    "        endpoint_name=endpoint_name,\n",
    "    )\n",
    "except Exception as e:\n",
    "    print(f'Error: {e}')\n",
    "    print('Two common reasons for this error')\n",
    "    print('1. You are in a AWS region that does not have the ml.p3.2xlarge instance type')\n",
    "    print('2. You have exceeded the service quota of this AWS account')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "b47f6006-3b1a-48cb-847f-0891d084c30d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Successfully deployed endpoint jumpstart-example-huggingface-text2text-2023-06-27-19-10-45-762 on 1 x ml.p3.2xlarge\n"
     ]
    }
   ],
   "source": [
    "print(f'Successfully deployed endpoint {endpoint_name} on 1 x {instance_type}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa55310f-fab2-40d8-aa5a-0e2b2588adb0",
   "metadata": {},
   "source": [
    "## 3. LLM Demos for Education"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "81be0382-3cd1-4707-baf2-8752ac46ef62",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import nlp_helper\n",
    "nlp_helper.endpoint_name = endpoint_name"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62ad5a23-488d-4b9f-8d2e-7a07130a9a2c",
   "metadata": {
    "tags": []
   },
   "source": [
    "In this notebook, we use the following texts to demonstrate summarization tasks and the generation of question & answer pairs.\n",
    "\n",
    "1. Quantum Computing from Wikipedia: https://en.wikipedia.org/wiki/Quantum_computing\n",
    "<!-- 1. Quantum Computing and Quantum Information (by Nielsen & Chuang): https://michaelnielsen.org/qcqi/QINFO-book-nielsen-and-chuang-toc-and-chapter1-nov00.pdf (this is a sample chapter from [this website](https://michaelnielsen.org/qcqi/)) -->\n",
    "2. Winnie the Pooh (by Alan Alexander Milne): https://www.gutenberg.org/ebooks/67098.txt.utf-8\n",
    "3. Attention is all you need (by Vaswani et al): https://arxiv.org/pdf/1706.03762.pdf\n",
    "4. Australian Budget 2023-24 Overview: https://budget.gov.au/content/overview/download/budget_overview-20230511.pdf\n",
    "\n",
    "Note that for this notebook, we are using the Flan T5 XL model for simplicity and ease of deployment--additional fine tuning or using improved models would be required to get better results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "94d29ef7-870c-456a-8a58-b60c788de24b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download pdfs and texts with the `curl` command. Flags used here are `-L` (allow redirects),\n",
    "# `-s` (for silent mode) and `-o` (to specify the output file name).\n",
    "\n",
    "# Attention is all you need (by Vaswani et al)\n",
    "!curl -Ls https://arxiv.org/pdf/1706.03762.pdf -o source_documents_dir/attention.pdf\n",
    "# Australian Budget 2023-24 Overview\n",
    "!curl -Ls https://budget.gov.au/content/overview/download/budget_overview-20230511.pdf -o source_documents_dir/aus_budget_overview-2023-24.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0ee88e7-9ad6-4e04-add3-7d3842ed3c3c",
   "metadata": {},
   "source": [
    "### 3.1 Wikipedia Page on Quantum Computing\n",
    "\n",
    "In this example, a Wikipedia page on Quantum Computing is used for context. The LLM is used for keyword generation, a point by point summary, and a set of question and answer pairs. You may also wish to replace the Wikipedia URL with a website, blog, or news article of your own preference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "72ea7b6f-6174-48ba-aa28-33ca600dcee6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "NCHARS = 400     # We will show just the first and last 400 characters of each extracted text. Increase this number for more context.\n",
    "NQUESTIONS = 10  # The number of Q&A pairs that we will generate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "61f5f17c-bb54-4bdd-bf2c-2c63b4d9ae9d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"800\"\n",
       "            height=\"300\"\n",
       "            src=\"https://en.wikipedia.org/wiki/Quantum_computing\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "            \n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x7fe119540d50>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wiki_paragraphs = nlp_helper.extract_paragraphs_from_html(\n",
    "    nlp_helper.download_url_text('https://en.wikipedia.org/wiki/Quantum_computing')\n",
    ")[1:11]  # We will skip the first 2 paragraphs\n",
    "wiki_txt = '\\n\\n'.join(wiki_paragraphs)\n",
    "# print(f'{txt1[:NCHARS]}...\\n\\n...{txt1[-NCHARS:]}')\n",
    "IFrame('https://en.wikipedia.org/wiki/Quantum_computing', width=800, height=300)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95a891c-e574-4ea3-ac7f-0d01e52aa38e",
   "metadata": {},
   "source": [
    "#### Key word Generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "114fd4e1-c7c7-4ab4-9a85-4f057bafb4d8",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "quantum, computing, superposition, qubit\n"
     ]
    }
   ],
   "source": [
    "KEY_WORDS = nlp_helper.generate_text_from_prompt(\n",
    "    f'FIND KEY WORDS\\n\\nContext:\\n{wiki_txt}\\nKey Words:',\n",
    "    seed=12345\n",
    ")\n",
    "key_word_list = KEY_WORDS.split(', ')\n",
    "print(KEY_WORDS)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9859bd8-e743-4a34-817f-06405a06a2d4",
   "metadata": {},
   "source": [
    "#### Summary of key points\n",
    "\n",
    "For each of each of paragraphs, let's create a short summary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "fe371a26-340a-41d3-9138-36479871dce8",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "summary = []\n",
    "for i, x in enumerate(wiki_paragraphs):\n",
    "    summary.append(f'{i+1}. {nlp_helper.summarize(x[:1500])}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "284d81bd-fb49-46d6-8142-9a8b8393e504",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h4>Key Points</h4><li>1. A quantum computer is a computer that exploits quantum mechanical phenomena.</li>\n",
       "<li>2. Quantum computing is a branch of computer science that uses quantum mechanics to perform calculations.</li>\n",
       "<li>3. Qubits are the building blocks of quantum computing.</li>\n",
       "<li>4. Quantum computing is the study of the computational complexity of problems with respect to quantum computers.</li>\n",
       "<li>5. The field of quantum computing has been developing rapidly in recent years, largely due to the development of quantum computing hardware and software.</li>\n",
       "<li>6. The field of quantum computing has been developing rapidly since the 1980s.</li>\n",
       "<li>7. Quantum computing was first proposed in the 1980s by the late Richard Feynman, who was a pioneer in the field of quantum theory.</li>\n",
       "<li>8. Quantum computing is the development of a computer that uses quantum physics to perform computations that are impossible for classical computers.</li>\n",
       "<li>9. The emergence of quantum computing has been a major factor in the development of quantum computing.</li>\n",
       "<li>10. Quantum computing is a new field of research.</li>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "HTML(\n",
    "    '<h4>Key Points</h4>' + \n",
    "    '\\n'.join([ f'<li>{x}</li>' for x in summary ])\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "84410c39-d71c-44f4-95d2-744cf527a9e1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Quantum computing is the development of a computer that uses quantum physics to perform computations that are impossible for classical computers.'"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The 10 points above can be used to create an even shorter summary.\n",
    "nlp_helper.summarize('\\n'.join(summary))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0aa90c0b-31e4-4286-9995-48ca4f66efad",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Checking for correct answers\n",
    "\n",
    "In this example, we generate a \"correct answer\" based on the text. One incorrect answer,\n",
    "and one correct answer (paraphrased slightly differently from the official \"correct answer\")\n",
    "from a student are generated. The LLM is used to check if the student's answer is correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "7b40942c-d2fc-4121-8143-a5875d5d4f66",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A quantum computer is a computer that exploits quantum mechanical phenomena.\n"
     ]
    }
   ],
   "source": [
    "prompt=f\"\"\"Context:{wiki_txt}\n",
    "What is quantum computing?\"\"\"\n",
    "answer = nlp_helper.generate_text_from_prompt(prompt, temperature=0.01)\n",
    "print(answer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "c12861c0-991c-4690-8a2c-a6d655ab3742",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "no\n"
     ]
    }
   ],
   "source": [
    "prompt=f\"\"\"Context:{wiki_txt}\n",
    "Question: What is quantum computing?\n",
    "Answer: {answer}\n",
    "Student: Quantum computing is using computers with quantum dots\n",
    "Is this answer correct?\"\"\"\n",
    "print(nlp_helper.generate_text_from_prompt(prompt, temperature=0.01))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3d663433-8b29-4dfe-8e11-542c48ab919e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "yes\n"
     ]
    }
   ],
   "source": [
    "prompt=f\"\"\"Context:{wiki_txt}\n",
    "Question: What is quantum computing?\n",
    "Answer: {answer}\n",
    "Student: Quantum computing involves using computers that make use of quantum mechanics\n",
    "Is this answer correct?\"\"\"\n",
    "print(nlp_helper.generate_text_from_prompt(prompt, temperature=0.01))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ab797e5-539b-49e4-8d8b-01113d7b7a04",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Generation of Question & Answer Pairs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "9cd4876c-b5cb-498d-ac7f-0b0b52c415a9",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>1</b>. <b><font color=#FF7F50>Question</font></b>: What limits the applications in non military applications of quantum computers?\n",
       "            <b><font color=#FA8072>Answer</font></b>: noise in quantum gates"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>2</b>. <b><font color=#FF7F50>Question</font></b>: What is the basic unit of information in quantum computing?\n",
       "            <b><font color=#FA8072>Answer</font></b>: qubit"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>3</b>. <b><font color=#FF7F50>Question</font></b>: What computer can do calculations exponentially faster than today's computers in practical scenarios?\n",
       "            <b><font color=#FA8072>Answer</font></b>: quantum computer"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>4</b>. <b><font color=#FF7F50>Question</font></b>: Who is the inventor of the quantum Turing machine?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Paul Benioff"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>5</b>. <b><font color=#FF7F50>Question</font></b>: Why is quantum computing a rather distant dream?\n",
       "            <b><font color=#FA8072>Answer</font></b>: The threshold theorem shows how increasing the number of qubits can mitigate errors,"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>6</b>. <b><font color=#FF7F50>Question</font></b>: How are quantum Physicists, theoretical physicists also known as what?\n",
       "            <b><font color=#FA8072>Answer</font></b>: quantum mechanics"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>7</b>. <b><font color=#FF7F50>Question</font></b>: What was the most practical application for computers during WWII?\n",
       "            <b><font color=#FA8072>Answer</font></b>: wartime cryptography"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>8</b>. <b><font color=#FF7F50>Question</font></b>: What are the main factors that limit the potential of a quantum computer?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Physically engineering high-quality qubits has proven challenging. If a physical qubit is not sufficiently isolated from its environment, it suffers from quantum decoherence, introducing noise into calculations"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>9</b>. <b><font color=#FF7F50>Question</font></b>: How are quantum computer related to the bit in traditional digital electronics?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Unlike a classical bit, a qubit can exist in a superposition of its two \"basis\" states, which loosely means that it is in both states simultaneously"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>10</b>. <b><font color=#FF7F50>Question</font></b>: What is the current state of the technology?\n",
       "            <b><font color=#FA8072>Answer</font></b>: the current state of the art is largely experimental and impractical, with several obstacles to useful applications"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nlp_helper.create_qna_pairs(wiki_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE, seed=1234)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be8b220a-7c2d-4d23-abe8-8ff0323b0e3c",
   "metadata": {},
   "source": [
    "### 3.2 Winnie the Pooh (by Alan Alexander Milne)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "26672f79-120b-496f-897c-d4b603ee285e",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "winnie_the_pooh = nlp_helper.download_url_text('https://www.gutenberg.org/ebooks/67098.txt.utf-8')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "89f2bd54-248a-45b1-a882-2159f0a9e4df",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CHAPTER III\n",
      "\n",
      "                   IN WHICH POOH AND PIGLET GO HUNTING\n",
      "                        AND NEARLY CATCH A WOOZLE\n",
      "\n",
      "\n",
      "The Piglet lived in a very grand house in the middle of a beech-tree,\n",
      "and the beech-tree was in the middle of the forest, and the Piglet lived\n",
      "in the middle of the house. Next to his house was a piece of broken\n",
      "board which had: \"TRESPASSERS W\" on it. When Christopher Robin asked the\n",
      "Piglet what it meant, he said it was his grandfather's name, and had\n",
      "been in the family for a long time, Christopher Robin said you\n",
      "_couldn't_ be called Trespassers W, and Piglet said yes, you could,\n",
      "because his grandfather was, and it was short for Trespassers Will,\n",
      "which was short for Trespassers William. And his grandfather had had two\n",
      "names in case he lost one--Trespassers after an uncle, and William after\n",
      "Trespassers.\n",
      "\n",
      "\"I've got two names,\" said Christopher Robin carelessly.\n",
      "\n",
      "\"Well, there you are, that proves it,\" said Piglet.\n",
      "\n",
      "One fine winter's day when Piglet was brushing away the snow in front of\n",
      "his house, he happened to look up, and there was Winnie-the-Pooh. Pooh\n",
      "was walking round and round in a circle, thinking of something else, and\n",
      "when Piglet called to him, he just went on walking.\n",
      "\n",
      "\"Hallo!\" said Piglet, \"what are _you_ doing?\"\n",
      "\n",
      "\"Hunting,\" said Pooh.\n",
      "\n",
      "\"Hunting what?\"\n",
      "\n",
      "\"Tracking something,\" said Winnie-the-Pooh very mysteriously.\n",
      "\n",
      "\"Tracking what?\" said Piglet, coming closer.\n",
      "\n",
      "\"That's just what I ask myself. I ask myself, What?\"\n",
      "\n",
      "\"What do you think you'll answer?\"\n",
      "\n",
      "\"I shall have to wait until I catch up with it,\" said Winnie-the-Pooh.\n",
      "\"Now, look there.\" He pointed to the ground in front of him. \"What do\n",
      "you see there?\"\n",
      "\n",
      "\"Tracks,\" said Piglet. \"Paw-marks.\" He gave a little squeak of\n",
      "excitement. \"Oh, Pooh! Do you think it's a--a--a Woozle?\"\n",
      "\n",
      "\"It may be,\" said Pooh. \"Sometimes it is, and sometimes it isn't. You\n",
      "never can tell with paw-marks.\"\n",
      "\n",
      "With these few words he went on tracking, and Piglet, after watching him\n",
      "for a minute or two, ran after him. Winnie-the-Pooh had come to a sudden\n",
      "stop, and was bending over the tracks in a puzzled sort of way.\n",
      "\n",
      "\"What's the matter?\" asked Piglet.\n",
      "\n",
      "\"It's a very funny thing,\" said Bear, \"but there seem to be\n",
      "_two_ animals now. This--whatever-it-was--has been joined by\n",
      "another--whatever-it-is--and the two of them are now proceeding\n",
      "in company. Would you mind coming with me, Piglet, in case they\n",
      "turn out to be Hostile Animals?\"\n",
      "\n",
      "Piglet scratched his ear in a nice sort of way, and said that he had\n",
      "nothing to do until Friday, and would be delighted to come, in case it\n",
      "really _was_ a Woozle.\n",
      "\n",
      "\"You mean, in case it really is two Woozles,\" said Winnie-the-Pooh, and\n",
      "Piglet said that anyhow he had nothing to do until Friday. So off they\n",
      "went together.\n",
      "\n",
      "There was a small spinney of larch trees just here, and it seemed as if\n",
      "the two Woozles, if that is what they were, had been going round this\n",
      "spinney; so round this spinney went Pooh and Piglet after them; Piglet\n",
      "passing the time by telling Pooh what his Grandfather Trespassers W had\n",
      "done to Remove Stiffness after Tracking, and how his Grandfather\n",
      "Trespassers W had suffered in his later years from Shortness of Breath,\n",
      "and other matters of interest, and Pooh wondering what a Grandfather was\n",
      "like, and if perhaps this was Two Grandfathers they were after now, and,\n",
      "if so, whether he would be allowed to take one home and keep it, and\n",
      "what Christopher Robin would say. And still the tracks went on in front\n",
      "of them....\n",
      "\n",
      "Suddenly Winnie-the-Pooh stopped, and pointed excitedly in front of him.\n",
      "\"_Look!_\"\n",
      "\n",
      "\"_What?_\" said Piglet, with a jump. And then, to show that he hadn't\n",
      "been frightened, he jumped up and down once or twice more in an\n",
      "exercising sort of way.\n",
      "\n",
      "\"The tracks!\" said Pooh. \"_A third animal has joined the other two!_\"\n",
      "\n",
      "\"Pooh!\" cried Piglet. \"Do you think it is another Woozle?\"\n",
      "\n",
      "\"No,\" said Pooh, \"because it makes different marks. It is either Two\n",
      "Woozles and one, as it might be, Wizzle, or Two, as it might be, Wizzles\n",
      "and one, if so it is, Woozle. Let us continue to follow them.\"\n",
      "\n",
      "So they went on, feeling just a little anxious now, in case the three\n",
      "animals in front of them were of Hostile Intent. And Piglet wished very\n",
      "much that his Grandfather T. W. were there, instead of elsewhere, and\n",
      "Pooh thought how nice it would be if they met Christopher Robin suddenly\n",
      "but quite accidentally, and only because he liked Christopher Robin so\n",
      "much. And then, all of a sudden, Winnie-the-Pooh stopped again, and\n",
      "licked the tip of his nose in a cooling manner, for he was feeling more\n",
      "hot and anxious than ever in his life before. _There were four animals\n",
      "in front of them!_\n",
      "\n",
      "\"Do you see, Piglet? Look at their tracks! Three, as it were, Woozles,\n",
      "and one, as it was, Wizzle. _Another Woozle has joined them!_\"\n",
      "\n",
      "And so it seemed to be. There were the tracks; crossing over each other\n",
      "here, getting muddled up with each other there; but, quite\n"
     ]
    }
   ],
   "source": [
    "x = winnie_the_pooh.find('CHAPTER III')\n",
    "pooh_txt = winnie_the_pooh[x:x+5000]  # Extract the first 5000 characters of chapter 3\n",
    "print(pooh_txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "c62792e0-f044-4c34-aa19-b8126f5d0b61",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The Piglet lived in a very grand house in the middle of a beech-tree, and the beech-tree was in the middle of the forest, and the Piglet lived in the middle of the house. Next to his house was a piece of broken board which had: \"TRESPASSERS W\" on it. When Christopher Robin asked the Piglet what it meant, he said it was his grandfather\\'s name, and had been in the family for a long time. Christopher Robin said you _could_ be called Trespassers W, and Piglet said yes, you could, because his grandfather was, and it was short for Trespassers Will'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(pooh_txt, \"What is the storyline here?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "55b3aad2-2775-41b8-b669-0df6a31d4c56",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Winnie-the-Pooh'"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(pooh_txt, \"Who is the main character?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "5fc08e87-7445-45a6-a3a5-1edac39dc696",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Winnie-the-Pooh and Piglet nearly catch a Woozle.'"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(pooh_txt, \"What happens at the end?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "c2c908fd-9ba1-4bf7-9f36-c8767afd92ea",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>1</b>. <b><font color=#FF7F50>Question</font></b>: Who thought the three woozles in front of them might be dangerous to them?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Winnie-the-Pooh"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>2</b>. <b><font color=#FF7F50>Question</font></b>: What did the broken board say?\n",
       "            <b><font color=#FA8072>Answer</font></b>: \"TRESPASSERS W\""
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>3</b>. <b><font color=#FF7F50>Question</font></b>: What was needed to be considered when using paw-marks?\n",
       "            <b><font color=#FA8072>Answer</font></b>: different animals"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>4</b>. <b><font color=#FF7F50>Question</font></b>: Why did Piglet's grandfather have two names?\n",
       "            <b><font color=#FA8072>Answer</font></b>: in case he lost one"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>5</b>. <b><font color=#FF7F50>Question</font></b>: What did Piglet and Winnie 'the-Pooh notice before the two of them noticed each other?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Tracks"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>6</b>. <b><font color=#FF7F50>Question</font></b>: Why didn't the \"trespassers W\" sign spell the proper name for the Piglet?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Christopher Robin said you _couldn't_ be called Trespassers W, and Piglet said yes, you could, because his grandfather was, and it was short for Trespassers Will, which was short for Trespassers William."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>7</b>. <b><font color=#FF7F50>Question</font></b>: Why was it so interesting that Trespasser WILL was short for Trespassers William\n",
       "            <b><font color=#FA8072>Answer</font></b>: Trespassers W was short for Trespassers Will, which was short for Trespassers William."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>8</b>. <b><font color=#FF7F50>Question</font></b>: Why were there different animal tracks around the spinney?\n",
       "            <b><font color=#FA8072>Answer</font></b>: The Woozles were not the same animal."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>9</b>. <b><font color=#FF7F50>Question</font></b>: What was the name of the three animals they were following\n",
       "            <b><font color=#FA8072>Answer</font></b>: Woozles"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>10</b>. <b><font color=#FF7F50>Question</font></b>: Who left the other animals behind?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Wizzle"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nlp_helper.create_qna_pairs(pooh_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE, seed=12345)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c64acef7-d4fe-49b7-9ca4-b1bb94fdc4c2",
   "metadata": {},
   "source": [
    "### 3.3 Attention is all you need (by Vaswani et al)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "b93e8343-42b5-4368-ad82-1b8cb90be303",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "attention = nlp_helper.extract_pages('source_documents_dir/attention.pdf')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "e59b3356-fa39-4ba7-849d-f1c5aeb98886",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"800\"\n",
       "            height=\"400\"\n",
       "            src=\"source_documents_dir/attention.pdf\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "            \n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x7fe118981210>"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "attention_txt = '\\n\\n'.join(attention[1:3] + attention[9:10])  # We will use pages 1, 2 (for the intro), and 9 (for the conclusion)\n",
    "# print(f'{attention_txt[:NCHARS]}...\\n\\n\\n...{attention_txt[-NCHARS:]}')\n",
    "IFrame('source_documents_dir/attention.pdf', width=800, height=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "470b58dc-40b9-4a21-9057-bfc4cdef2c20",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Question Answering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "edbf35b5-2c79-4924-a50b-49e6fef53c38",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'We propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.'"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(attention_txt, \"What is the main gist of the paper?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "8674d667-7411-40ab-8b4c-90ce98ef1581",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The Transformer is the first sequence transduction model relying entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(attention_txt, \"What is the problem being solved?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "e69d7c38-d788-41f1-a7f0-5abf463c0a73",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Transformer is the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(attention_txt, \"What is the conclusion of the paper?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "ef8c05dc-a837-4454-b1aa-d13345dee2be",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The text will be split up into chunks of 1202 characters and summarized\n"
     ]
    }
   ],
   "source": [
    "chunk_size = len(attention_txt)//8\n",
    "print(f'The text will be split up into chunks of {chunk_size} characters and summarized')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "cf3771d6-11de-416e-be0c-ac91b84d93f7",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<h4>Key Points</h4>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "1. The emergence of recurrent models and encoder-decoder architectures in language modeling and transduction has been a major factor in the development of a wide range of models for language modeling and transduction."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "2. Attention mechanisms have been used in a variety of models to reduce the number of operations required to draw global dependencies between input and output positions [1,2,3,7,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,75,76,77,78,79,80,81,82,83,84,85,88,89,90,91,92,93,94,95,96,97,98,99,99,100,101,102,103,104,105,106,107,108,109,110,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "3. The Transformer is a model for transduction of sequences of inputs and outputs based on self-attention."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "4. odels have an encoder-decoder structure [ 5,2,35]. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1, respectively. 3.1 Encoder and Decoder Stacks Encoder: The encoder is composed of a stack of N= 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position- wise fully connected feed-forward network. We employ a residual connection [ 11] around each of the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is LayerNorm( x+ Sublayer( x)), where Sublayer( x) is the function implemented by the sub-layer i."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "5. te encoder: The decoder is also composed of a stack of N= 6 identical layers. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack. Similar to the encoder, we employ residual connections around each of the sub-layers, followed by layer normalization."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "6. We present a small-data RNN model for German-to-English translation."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "7. Transformer, the first sequence transduction model based entirely on attention, replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles. We are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large input and output modalities such as images, audio and video."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "8. We present a new model for machine translation that is able to learn a tensor representation of the target language and a tensor representation of the target language."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(HTML('<h4>Key Points</h4>'))\n",
    "summary = []\n",
    "for i in range(8):\n",
    "    x0 = i*chunk_size\n",
    "    x1 = (i+1)*chunk_size\n",
    "    line_summary = f'{i+1}. {nlp_helper.summarize(attention_txt[x0:x1])}'\n",
    "    display(HTML(line_summary))\n",
    "    summary.append(line_summary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "b10a987c-c5f6-4d99-a587-1aadf5b37878",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>1</b>. <b><font color=#FF7F50>Question</font></b>: What is the architecture of their model?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Most competitive neural sequence transduction models have an encoder-decoder structure [ 5,2,35]"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>2</b>. <b><font color=#FF7F50>Question</font></b>: What is Transformer and how is the architecture different from usual models?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence- aligned RNNs or convolution."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>3</b>. <b><font color=#FF7F50>Question</font></b>: What type of architecture do they use?\n",
       "            <b><font color=#FA8072>Answer</font></b>: a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>4</b>. <b><font color=#FF7F50>Question</font></b>: How is Transformer architected?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Model Architecture Most competitive neural sequence transduction models have an encoder-decoder structure [5,2,35]. Here, the encoder maps an input sequence of symbol representations (x1;:::;x n)to a sequence of continuous representations z= (z1;:::;z n). Given z, the decoder then generates an output sequence (y1;:::;y m)of symbols one element at a time. At each step the model is auto-regressive [10,11], consuming the previously generated symbols as additional input when generating the next. The Transformer follows this overall architecture using stacked self-attention"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>5</b>. <b><font color=#FF7F50>Question</font></b>: What is the model architecture of Transformer model?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Most competitive neural sequence transduction models have an encoder-decoder structure [ 5,2,35]"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>6</b>. <b><font color=#FF7F50>Question</font></b>: What method do we introduce to represent the output information as a binary vector using attention mechanisms based on the previous vectors?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Transformer"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>7</b>. <b><font color=#FF7F50>Question</font></b>: What is a Transformer model?\n",
       "            <b><font color=#FA8072>Answer</font></b>: We propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>8</b>. <b><font color=#FF7F50>Question</font></b>: How does Transformer improve the state-of-the-art model on English-to-German and English-to-French toyota j250+?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Transformer is the first transduction model relying entirely on attention to compute representations of its input and output without using sequence- aligned RNNs or convolution."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>9</b>. <b><font color=#FF7F50>Question</font></b>: What does Transformer rely on other than recurrent layers?\n",
       "            <b><font color=#FA8072>Answer</font></b>: attention mechanism"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>10</b>. <b><font color=#FF7F50>Question</font></b>: Does the Transformer achieve state of the art compared to the previous state of the art models\n",
       "            <b><font color=#FA8072>Answer</font></b>: Yes"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nlp_helper.create_qna_pairs(attention_txt, NQUESTIONS, output_style=nlp_helper.QNA_OUTPUT_STYLE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdf13af1-b6be-4d34-9ad3-09f5cb70320d",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 3.4 Australian Budget 2023-24 Overview (Medicare)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12091219-63d9-49f2-8f41-272be432731e",
   "metadata": {},
   "source": [
    "In this example, we look at the Australian Budget 2023-24 and we focus on the Medicare improvements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "b747aa8a-bc64-45a6-ad40-37a30d8b483d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Historic investment in Medicare \n",
      "Strengthening Medicare\n",
      "Medicare is the foundation of Australia’s primary health care system. In this \n",
      "Budget, the Government is investing $5.7 billion over 5 years from 2022—23 to \n",
      "strengthen Medicare and make it cheaper and easier to see a doctor.\n",
      "The Strengthening Medicare package includes the largest investment in bulk \n",
      "billing incentives ever. The Government is...\n",
      "\n",
      "\n",
      "...llion over 4 years to establish the Primary Care and Midwifery \n",
      "Scholarships program, supporting registered nurses and midwives in \n",
      "post-graduate study to improve their skills \n",
      "• $31.6 million over 2 years for improved training arrangements for \n",
      "international medical students working rur al and remote locations.\n",
      "26 Strengthening MedicareStronger foundations for a better future   |   Budget 2023–24\n"
     ]
    }
   ],
   "source": [
    "# Extracting the pages from the Budget overview and work on the pages 24 to 27 (Medicare related)\n",
    "aus_budget_overview = nlp_helper.extract_pages('source_documents_dir/aus_budget_overview-2023-24.pdf')\n",
    "txt_aus_budget_overview_medicare = '\\n\\n'.join(aus_budget_overview[24:27])  # We will use pages 24 to 27. Those pages cover the Medicare budget.\n",
    "print(f'{txt_aus_budget_overview_medicare[:NCHARS]}...\\n\\n\\n...{txt_aus_budget_overview_medicare[-NCHARS:]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d28a3481-4f21-41f7-8656-521675f3b436",
   "metadata": {},
   "source": [
    "#### Question Answering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "068785da-9875-4d05-bcd6-e9354e59b934",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The Government is investing $5.7 billion over 5 years from 2022-23 to strengthen Medicare and make it cheaper and easier to see a doctor.'"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.summarize(txt_aus_budget_overview_medicare)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "ccd65b47-502d-4cd9-8896-a03160bf7415",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<b>1</b>. <b><font color=#FF7F50>Question</font></b>: How many Australians will be enabled to access a GP with no out-of-pocket cost for the consultation?\n",
       "            <b><font color=#FA8072>Answer</font></b>: 11.6 million"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>2</b>. <b><font color=#FF7F50>Question</font></b>: What will support 11.6 million Australians to access a GP with no out-of-pocket costs?\n",
       "            <b><font color=#FA8072>Answer</font></b>: The Government is tripling the incentive paid to GPs to bulk bill consultations for families with children under 16 years, pensioners and Commonwealth concession card holders, at a cost of $3.5 billion."
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>3</b>. <b><font color=#FF7F50>Question</font></b>: How much of the bulk billing incentive will be paid to GPs to enable general practice consultations over 6 minutes in length\n",
       "            <b><font color=#FA8072>Answer</font></b>: $3.5 billion"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>4</b>. <b><font color=#FF7F50>Question</font></b>: What are the 2 ways the Government wants to increase access to primary care for Australians?\n",
       "            <b><font color=#FA8072>Answer</font></b>: Increasing access to primary care with coordinated teams"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<b>5</b>. <b><font color=#FF7F50>Question</font></b>: How much will each family, pensioner and Commonwealth concession card holder pay to see a GP without paying any out of pocket costs?\n",
       "            <b><font color=#FA8072>Answer</font></b>: $0"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "nlp_helper.create_qna_pairs(txt_aus_budget_overview_medicare, 5, output_style=nlp_helper.QNA_OUTPUT_STYLE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "bb9bbb17-56ff-4607-9d80-88e623e51168",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'telehealth general practice services which are between 6 and 20 minutes in length'"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(txt_aus_budget_overview_medicare,\"What is a Level B consultation?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ee86044c-3f69-433c-89e4-b5ead1c17b55",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'$5.7 billion'"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(txt_aus_budget_overview_medicare, \"How much is the govermement investing?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "6deec971-3be8-46e6-88e7-7d3be2d20d8b",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The Government will also invest in new services to help homeless people and culturally and linguistically diverse communities to access primary care.'"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp_helper.ask(txt_aus_budget_overview_medicare, \"Is the governement helping the homeless people?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ae610e3-1ecb-4dd8-94a4-137792b45b38",
   "metadata": {},
   "source": [
    "## 4. [Optional] LLM Demos for Education Part II\n",
    "\n",
    "In this section, we deploy a Gradio app that takes a URL as input, and allows us to answer questions based on the content of the web page."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77c2f91c-56e4-497d-a107-d2a2824ebff0",
   "metadata": {},
   "source": [
    "### 4.1 Gradio Demo App"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "11f4c760-b110-41d5-818f-cf848da229fe",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "!pip install gradio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "3c58ed7a-a92b-44c5-bfbb-74d7063b9bd5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import gradio as gr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "f803d584-df90-4e15-8923-0f19d7fb90cf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def url2context(url):\n",
    "    paragraph_list = extract_paragraphs_from_html(\n",
    "        download_url_text(url)\n",
    "    )[1:11]  # We will skip the first paragraph, and take only 10 paragraphs\n",
    "    return '\\n\\n'.join(paragraph_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "d43bd2c0-b8d6-4b39-8a77-06851e28184f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running on local URL:  http://127.0.0.1:7860\n",
      "Running on public URL: https://8b966605fc7f3a55d1.gradio.live\n",
      "\n",
      "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div><iframe src=\"https://8b966605fc7f3a55d1.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": []
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def chatbot(prompt, temperature, max_length, url):\n",
    "    if url == \"\":\n",
    "        return generate_text_from_prompt(prompt, max_length, temperature)\n",
    "    else:\n",
    "        return ask(url2context(url), prompt)\n",
    "\n",
    "def summary(url):\n",
    "    context = url2context(url)\n",
    "    key_words = generate_text_from_prompt(\n",
    "        f'FIND KEY WORDS\\n\\nContext:\\n{context}\\nKey Words:'\n",
    "    )\n",
    "    return f\"\"\"{summarize(context)}\\n\\nKey words: {key_words}\"\"\"\n",
    "\n",
    "with gr.Blocks() as demo:\n",
    "    gr.Markdown(\"## Flan T5 Chatbot Demo\")\n",
    "    with gr.Row():\n",
    "        with gr.Column():\n",
    "            url = gr.Textbox(label=\"URL\", placeholder=\"Enter URL here\", lines=1, show_label=True,\n",
    "                             value=\"https://mmrjournal.biomedcentral.com/articles/10.1186/s40779-022-00416-w\"\n",
    "                             # value=\"https://k12.libretexts.org/Bookshelves/Science_and_Technology/Biology/03%3A_Genetics/3.14%3A_Human_Genome\"\n",
    "                            )\n",
    "    with gr.Row():\n",
    "        with gr.Column():\n",
    "            prompt = gr.Textbox(\n",
    "                label=\"Prompt\", placeholder=\"Enter your prompt here\", lines=3, show_label=True,\n",
    "                value=f\"How do mRNA vaccines work for pancreatic cancer treatment?\")\n",
    "            temperature = gr.Slider(label=\"Temperature\", minimum=0.0, maximum=1.0, value=0.5)\n",
    "            max_length = gr.Slider(label=\"Max Length\", minimum=20, maximum=400, value=100)\n",
    "        with gr.Column():\n",
    "            output = gr.Textbox(label=\"Output\", lines=10, show_label=True)\n",
    "    with gr.Row():\n",
    "        with gr.Column():\n",
    "            submit_btn = gr.Button(\"Submit\")\n",
    "        with gr.Column():\n",
    "            summary_btn = gr.Button(\"Summary\")\n",
    "    submit_btn.click(\n",
    "        fn=chatbot,\n",
    "        inputs=[prompt, temperature, max_length, url],\n",
    "        outputs=output,\n",
    "        api_name=\"chatbot\",\n",
    "        queue=False\n",
    "    )\n",
    "    summary_btn.click(\n",
    "        fn=summary,\n",
    "        inputs=[url],\n",
    "        outputs=output,\n",
    "        api_name=\"summary\",\n",
    "        queue=False\n",
    "    )\n",
    "\n",
    "demo.launch(share=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe022d9b-7c72-4e72-b47b-35346eac069f",
   "metadata": {},
   "source": [
    "## 5. Cleanup\n",
    "\n",
    "Delete the SageMaker model and endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "dd21b3e6-5055-41f5-9b8c-9faaeea66282",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#model_predictor.delete_model()\n",
    "#model_predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09014e18-33e5-4185-ab90-9957c25d1f39",
   "metadata": {
    "tags": []
   },
   "source": [
    "To completely shutdown SageMaker, go to File > Shut Down > Shutdown All"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1c6d7b2-9f98-42f7-8148-8306c0d85bb9",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   }
  ],
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}