{ "cells": [ { "cell_type": "markdown", "id": "cfd8d0c0-2c2d-4751-8f5d-999d79ae6108", "metadata": {}, "source": [ "## Convert video to text with Speech-to-text model and sentence embedding model\n", "\n", "In this notebook, we will extract information from video/audio files with [Whipser model](https://github.com/openai/whisper). Be leveraging multilingual support, we can extract tanscripts from videos files mixed different languages, even for one video file with different languanges. We provide the following options for whisper inference:\n", "- Batch inference with SageMaker Processing job, we can process massive data and store them into vector database for RAG solution.\n", "- Real-time inference with SageMaker Endpoint, we can leverage it to do summarizaton or QA with a short video/audio file (less than 6MB)." ] }, { "cell_type": "code", "execution_count": null, "id": "94a02764-403f-4c35-9754-e99e2a8d5b58", "metadata": { "tags": [] }, "outputs": [], "source": [ "!pip install -U sagemaker -q" ] }, { "cell_type": "markdown", "id": "38922488-e64f-45e2-983a-5c176f4e13ab", "metadata": {}, "source": [ "## Set up" ] }, { "cell_type": "code", "execution_count": null, "id": "3c209ddd-7a9a-4c7c-b582-6729ce88a4d9", "metadata": { "tags": [] }, "outputs": [], "source": [ "from sagemaker.huggingface import HuggingFaceProcessor\n", "from sagemaker import get_execution_role\n", "from sagemaker.processing import ProcessingInput, ProcessingOutput\n", "from sagemaker.huggingface import HuggingFaceModel\n", "import sagemaker\n", "import boto3\n", "import json\n", "\n", "try:\n", " role = sagemaker.get_execution_role()\n", "except ValueError:\n", " iam = boto3.client('iam')\n", " role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']\n", "\n", "sess = sagemaker.session.Session()\n", "bucket = sess.default_bucket()\n", "prefix = \"sagemaker/rag_video\"\n", "folder_name = \"genai_workshop\"\n", "s3_input = f\"s3://{bucket}/{prefix}/raw_data/{folder_name}\" # Directory for video files\n", "s3_output_clips = f\"s3://{bucket}/{prefix}/clips\" # Directory for video clips\n", "s3_output_transcript = f\"s3://{bucket}/{prefix}/transcript\" # Directory for transcripts" ] }, { "cell_type": "code", "execution_count": null, "id": "7600cddd-9a23-4ab1-abf6-6cedd8b7fa55", "metadata": { "tags": [] }, "outputs": [], "source": [ "%store s3_output_transcript" ] }, { "cell_type": "markdown", "id": "5892e7ff-edfe-4112-9ccd-bc34b2fc1bde", "metadata": {}, "source": [ "## Upload test data to S3 bucket\n", "\n", "Download data from YouTube." ] }, { "cell_type": "code", "execution_count": null, "id": "388a7d5d-eeaa-48bd-9b5b-9a352489cc89", "metadata": { "tags": [] }, "outputs": [], "source": [ "# Download data from YouTube\n", "!pip install pytube" ] }, { "cell_type": "code", "execution_count": null, "id": "d5eee50a-9c66-456b-879f-98a09d85d87e", "metadata": { "tags": [] }, "outputs": [], "source": [ "from pytube import YouTube\n", "\n", "VIDEO_SAVE_DIRECTORY = \"./videos\"\n", "AUDIO_SAVE_DIRECTORY = \"./audio\"\n", "\n", "def download(video_url):\n", " video = YouTube(video_url)\n", " video = video.streams.get_highest_resolution()\n", "\n", " try:\n", " video.download(VIDEO_SAVE_DIRECTORY)\n", " except:\n", " print(\"Failed to download video\")\n", "\n", " print(\"video was downloaded successfully\")\n", " \n", "def download_audio(video_url):\n", " video = YouTube(video_url)\n", " audio = video.streams.filter(only_audio = True).first()\n", "\n", " try:\n", " audio.download(AUDIO_SAVE_DIRECTORY)\n", " except:\n", " print(\"Failed to download audio\")\n", "\n", " print(\"audio was downloaded successfully\")" ] }, { "cell_type": "code", "execution_count": null, "id": "4888c5f4-52a0-4b3d-95e5-0bd0cca84af5", "metadata": { "tags": [] }, "outputs": [], "source": [ "# JAWS-UG AI/ML (Japanese) #16 Generative AI: https://www.youtube.com/watch?v=PkZenNAXtYs\n", "# New York Summit 2023 AIML: https://www.youtube.com/watch?v=1PkABWCJINM Totally 36mins" ] }, { "cell_type": "code", "execution_count": null, "id": "b8e6fe10-3eed-4578-b4ad-7b4e044f84a7", "metadata": { "tags": [] }, "outputs": [], "source": [ "download(\"https://www.youtube.com/watch?v=dBzCGcwYCJo\")" ] }, { "cell_type": "code", "execution_count": null, "id": "02daea10-0cbf-4f5d-b294-023f6d56b12f", "metadata": { "tags": [] }, "outputs": [], "source": [ "!aws s3 cp videos/genai_interview.mp4 {s3_input}/" ] }, { "cell_type": "markdown", "id": "7f00c2e3-c897-4a2c-a12e-b07c75ca5986", "metadata": { "tags": [] }, "source": [ "## Batch inference with SageMaker Processing" ] }, { "cell_type": "code", "execution_count": null, "id": "2e002118-b7dc-4915-9fe0-f80e3bbfe847", "metadata": { "tags": [] }, "outputs": [], "source": [ "hfp = HuggingFaceProcessor(\n", " role=get_execution_role(), \n", " instance_count=1,\n", " instance_type='ml.p3.2xlarge',\n", " transformers_version='4.28.1',\n", " pytorch_version='2.0.0', \n", " base_job_name='frameworkprocessor-hf',\n", " py_version=\"py310\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "90383250-179b-4499-8583-fdca5320ee75", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "hfp.run(\n", " code='preprocessing.py',\n", " source_dir=\"data_preparation\",\n", " inputs=[\n", " ProcessingInput(source=s3_input, destination=\"/opt/ml/processing/input\")\n", " ], \n", " outputs=[\n", " ProcessingOutput(source='/opt/ml/processing/output_clips', destination=s3_output_clips),\n", " ProcessingOutput(source='/opt/ml/processing/transcripts', destination=s3_output_transcript),\n", " ],\n", " arguments=[\n", " \"--whisper-model\", \"whisper-large-v2\",\n", " \"--target-language\", \"en\",\n", " \"--sentence-embedding-model\", \"all-mpnet-base-v2\",\n", " \"--order\", \"5\"\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "1efea032-76d4-478c-943a-a4be59d47ea7", "metadata": {}, "source": [ "## Deploy Whipser model to SageMaker for real-time inference" ] }, { "cell_type": "code", "execution_count": null, "id": "0ed73b43-b5c7-4e1a-bd15-8e020cab8f51", "metadata": { "tags": [] }, "outputs": [], "source": [ "endpoint_name=\"whisper-large-v2\"\n", "# Hub Model configuration. https://huggingface.co/models\n", "hub = {\n", " 'HF_MODEL_ID':'openai/whisper-large-v2',\n", " 'HF_TASK':'automatic-speech-recognition',\n", "}\n", "\n", "# create Hugging Face Model Class\n", "huggingface_model = HuggingFaceModel(\n", " transformers_version='4.26.0',\n", " pytorch_version='1.13.1',\n", " py_version='py39',\n", " \n", " env=hub,\n", " role=role\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "3d00b59f-c17c-4446-a900-55e2835c5625", "metadata": { "tags": [] }, "outputs": [], "source": [ "# deploy model to SageMaker Inference\n", "predictor = huggingface_model.deploy(\n", " endpoint_name=endpoint_name,\n", " initial_instance_count=1, # number of instances\n", " instance_type='ml.g5.xlarge' # ec2 instance type\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "9cf3d59a-0ccf-4baf-95d9-a292c43872dc", "metadata": { "tags": [] }, "outputs": [], "source": [ "client = boto3.client('runtime.sagemaker')\n", "file = \"test_raw_data/test.webm\"\n", "with open(file, \"rb\") as f:\n", " data = f.read()" ] }, { "cell_type": "code", "execution_count": null, "id": "e8e8d3ec-15a3-4e5f-b85b-19dd5ce98dd1", "metadata": { "tags": [] }, "outputs": [], "source": [ "response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='audio/x-audio', Body=data)\n", "output = json.loads(response['Body'].read())\n", "print(f\"Extracted text from the audio file:\\n {output['text']}\")" ] }, { "cell_type": "markdown", "id": "959e9aac-7690-478e-a703-9e9d433f9fb0", "metadata": { "tags": [] }, "source": [ "You can follow section for `Example - Build a multi-functional chatbot with Amazon SageMaker` in [REAMDE](./README.md) to build a multi-functional chatbot with whipser endpoint.\n", "Please delete endpoint once you don't it." ] }, { "cell_type": "code", "execution_count": null, "id": "f23df075-6f8b-42f7-994c-b8510d87c3dd", "metadata": { "tags": [] }, "outputs": [], "source": [ "predictor.delete_endpoint()" ] }, { "cell_type": "code", "execution_count": null, "id": "1a8c26ef-a1e2-47af-a621-309a594a80fd", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "availableInstances": [ { "_defaultOrder": 0, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.t3.medium", "vcpuNum": 2 }, { "_defaultOrder": 1, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.t3.large", "vcpuNum": 2 }, { "_defaultOrder": 2, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.t3.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 3, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.t3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 4, "_isFastLaunch": true, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5.large", "vcpuNum": 2 }, { "_defaultOrder": 5, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 6, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 7, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 8, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 9, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 10, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 11, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 12, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.m5d.large", "vcpuNum": 2 }, { "_defaultOrder": 13, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.m5d.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 14, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.m5d.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 15, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.m5d.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 16, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.m5d.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 17, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.m5d.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 18, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.m5d.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 19, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.m5d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 20, "_isFastLaunch": false, "category": "General purpose", "gpuNum": 0, "hideHardwareSpecs": true, "memoryGiB": 0, "name": "ml.geospatial.interactive", "supportedImageNames": [ "sagemaker-geospatial-v1-0" ], "vcpuNum": 0 }, { "_defaultOrder": 21, "_isFastLaunch": true, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 4, "name": "ml.c5.large", "vcpuNum": 2 }, { "_defaultOrder": 22, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 8, "name": "ml.c5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 23, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.c5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 24, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.c5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 25, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 72, "name": "ml.c5.9xlarge", "vcpuNum": 36 }, { "_defaultOrder": 26, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 96, "name": "ml.c5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 27, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 144, "name": "ml.c5.18xlarge", "vcpuNum": 72 }, { "_defaultOrder": 28, "_isFastLaunch": false, "category": "Compute optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.c5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 29, "_isFastLaunch": true, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g4dn.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 30, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g4dn.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 31, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g4dn.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 32, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g4dn.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 33, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g4dn.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 34, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g4dn.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 35, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 61, "name": "ml.p3.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 36, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 244, "name": "ml.p3.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 37, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 488, "name": "ml.p3.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 38, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.p3dn.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 39, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.r5.large", "vcpuNum": 2 }, { "_defaultOrder": 40, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.r5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 41, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.r5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 42, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.r5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 43, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.r5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 44, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.r5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 45, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 512, "name": "ml.r5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 46, "_isFastLaunch": false, "category": "Memory Optimized", "gpuNum": 0, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.r5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 47, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 16, "name": "ml.g5.xlarge", "vcpuNum": 4 }, { "_defaultOrder": 48, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 32, "name": "ml.g5.2xlarge", "vcpuNum": 8 }, { "_defaultOrder": 49, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 64, "name": "ml.g5.4xlarge", "vcpuNum": 16 }, { "_defaultOrder": 50, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 128, "name": "ml.g5.8xlarge", "vcpuNum": 32 }, { "_defaultOrder": 51, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 1, "hideHardwareSpecs": false, "memoryGiB": 256, "name": "ml.g5.16xlarge", "vcpuNum": 64 }, { "_defaultOrder": 52, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 192, "name": "ml.g5.12xlarge", "vcpuNum": 48 }, { "_defaultOrder": 53, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 4, "hideHardwareSpecs": false, "memoryGiB": 384, "name": "ml.g5.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 54, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 768, "name": "ml.g5.48xlarge", "vcpuNum": 192 }, { "_defaultOrder": 55, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4d.24xlarge", "vcpuNum": 96 }, { "_defaultOrder": 56, "_isFastLaunch": false, "category": "Accelerated computing", "gpuNum": 8, "hideHardwareSpecs": false, "memoryGiB": 1152, "name": "ml.p4de.24xlarge", "vcpuNum": 96 } ], "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "conda_pytorch_p310", "language": "python", "name": "conda_pytorch_p310" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 5 }