{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fb457992-5926-4ad4-986a-944cd5d7411a",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 音声認識モデル Wav2Vec と whisper を SageMaker 上でデプロイして試してみる\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f415f405-ea03-4ad3-af0c-bae046e96400",
   "metadata": {},
   "source": [
    "本チュートリアルでは、音声認識モデルである wav2vec2 や whisper を Studio Lab 経由で、AWS 環境上にデプロイする流れを体験してみます。  \n",
    "Amazon SageMaker 上で Hugging Face Inference DLC を使って MetaAI の [wav2vec2](https://arxiv.org/abs/2006.11477) や OpenAI の [whisper](https://cdn.openai.com/papers/whisper.pdf) モデルを手軽に利用することが可能です。  \n",
    "\n",
    "このサンプルでは、\n",
    "- Studio Lab 上で Amazon SageMaker の機能を利用する設定を行う\n",
    "- Studio Lab 経由で Amazon SageMaker 上に `transformers` のモデルをデプロイする\n",
    "- デプロイしたモデルに対して音声ファイルを投げて推論させてみる\n",
    "\n",
    "の３つを各モデルについて行っていきます。  \n",
    "なお、このサンプルは [Automatic Speech Recogntion with Hugging Face's Transformers & Amazon SageMaker](https://github.com/huggingface/notebooks/blob/main/sagemaker/20_automatic_speech_recognition_inference/sagemaker-notebook.ipynb) と[Sentence Embeddings with Hugging Face Transformers, Sentence Transformers and Amazon SageMaker - Custom Inference for creating document embeddings with Hugging Face's Transformers](https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb)をベースに作成しています。  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33fa33eb-1d68-41d6-8f52-a3b0703b81a2",
   "metadata": {},
   "source": [
    "## 環境のセットアップ"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "330578fb-886c-4300-8e3b-7d56a94346e0",
   "metadata": {},
   "source": [
    "### モジュールのインストール\n",
    "\n",
    "インストールが完了したら、カーネルを再起動することを忘れないでください。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4d096d0-346c-4b1a-8c49-98067216c3a2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -U boto3\n",
    "!pip install -U sagemaker\n",
    "!pip install -U transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "742c9146-bd9e-43b1-94e3-ee47eca24d96",
   "metadata": {
    "tags": []
   },
   "source": [
    "### AWS 環境のセットアップ\n",
    "\n",
    "\n",
    "下記のハンズオン資料の「2-2. AWS へ接続するための環境構築」「2-3. SageMaker Training Instance が利用する IAM ロールを作成する」で紹介されている手順にそって作業を進めてください。  \n",
    "\n",
    "- https://github.com/aws-samples/aws-ml-enablement-workshop/blob/main/notebooks/scenario_churn/customer_churn_sagemaker.ipynb \n",
    "\n",
    "\n",
    "作業が完了して作成された Role の ARN の値を以下のセルで置き換えてください。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18a56f4f-1102-4d67-81c5-79eee5893aec",
   "metadata": {},
   "outputs": [],
   "source": [
    "role = \"arn:aws:iam::000000000000:role/StudioLabWhisperExecutionRole\"  # TODO: コピペした値で置き換える"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e272c23-d3b3-47de-8aac-c7955660263d",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Wav2Vec のデプロイ\n",
    "\n",
    "まずは、 SageMaker SDK の HuggingFace 拡張を使って簡単にモデルをデプロイしてみましょう。  \n",
    "以下のページで公開されている「wav2vec2-large-960h」と呼ばれているモデルを使っていきます。  \n",
    "\n",
    "- https://huggingface.co/facebook/wav2vec2-large-960h\n",
    "\n",
    "この HuggingFace のページ上で SageMaker でモデルをデプロイするためのコードを手軽に生成できます。  \n",
    "\n",
    "まずは、ページにある「Deploy」ボタンをクリックします。  \n",
    "\n",
    "![](./imgs/101_wav2vec_deploy_button.png)\n",
    "\n",
    "いくつかデプロイの選択肢が出てくるので今回は「Amazon SageMaker」を選択します。  \n",
    "\n",
    "![](./imgs/102_wav2vec_deploy_select_sagemaker.png)\n",
    "\n",
    "Task を「Automatic Speech Recognition」、Configuration を「AWS」に設定すると deploy 用のコードが生成されます。  \n",
    "\n",
    "![](./imgs/103_wav2vec_deploy_generate_code.png)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2215719b-ea07-46e7-b65e-b93086b73873",
   "metadata": {},
   "source": [
    "コピーしたコードはデプロイのコードと推論のコードが含まれています。  \n",
    "このままでは動かないため、audio_serializer の追加や IAM Role 部分のコメントアウトなどをする必要があります。  \n",
    "\n",
    "ここで使用している `DataSerializer` は推論リクエストの際に、音声ファイルをリクエストする手順を簡易化してくれます。  \n",
    "\n",
    "\n",
    "注意！！！  \n",
    "**以下のコードを実行すると、SageMaker 上でエンドポイントがデプロイされます。  \n",
    "エンドポイントは時間課金されるため、エンドポイントの消し忘れに注意してください。**  \n",
    "意図せぬ課金を避けるためにも、最後の後片付けのステップは必ず実施してください。  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6ed2fc7-0678-424b-9590-dcfe153f45c2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.huggingface import HuggingFaceModel\n",
    "from sagemaker.serializers import DataSerializer\n",
    "import sagemaker\n",
    "\n",
    "#### <変更箇所> 下の行はコメントアウトし、最初に設定した値を使う。 ####\n",
    "# role = sagemaker.get_execution_role()\n",
    "#### \n",
    "# Hub Model configuration. https://huggingface.co/models\n",
    "hub = {\n",
    "\t'HF_MODEL_ID':'facebook/wav2vec2-base-960h',\n",
    "\t'HF_TASK':'automatic-speech-recognition'\n",
    "}\n",
    "\n",
    "# create Hugging Face Model Class\n",
    "huggingface_model = HuggingFaceModel(\n",
    "\ttransformers_version='4.17.0',\n",
    "\tpytorch_version='1.10.2',\n",
    "\tpy_version='py38',\n",
    "\tenv=hub,\n",
    "\trole=role, \n",
    ")\n",
    "\n",
    "# deploy model to SageMaker Inference\n",
    "#### <変更箇所> 音声データをリクエストできるよう Audio Seriealizer を追加する ####\n",
    "audio_serializer = DataSerializer(content_type='audio/x-audio')  # x-audio にしておくことで複数の音声フォーマットに対応\n",
    "####\n",
    "predictor = huggingface_model.deploy(\n",
    "\tinitial_instance_count=1, # number of instances\n",
    "\tinstance_type='ml.m5.xlarge', # ec2 instance type\n",
    "    #### <変更箇所> Audio Serializer を追加 ####\n",
    "    serializer=audio_serializer\n",
    "    ####\n",
    ")\n",
    "\n",
    "#### <変更箇所> 推論部分は別途実装する必要があるのでコメントアウト\n",
    "# predictor.predict({\n",
    "# \t'inputs': \"sample1.flac\"\n",
    "# })\n",
    "####\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd993581-078c-4290-a9d4-7c855a5ec220",
   "metadata": {
    "tags": []
   },
   "source": [
    "### デプロイしたエンドポイントに対して音声データを送ってみる\n",
    "\n",
    "まずは、公開されている音声データを使ってどんな結果が返ってくるかみてみましょう。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25954710-5070-48a6-87e3-18e084ad42ac",
   "metadata": {},
   "source": [
    "huggingface.io 上で公開されている `libirispeech` と呼ばれる音声データセットの中からファイルを取得します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c7decec-69c7-4fc1-a572-7b553fbf8d18",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://cdn-media.huggingface.co/speech_samples/sample1.flac"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96877182-b786-4671-ae2a-a60b3b24f094",
   "metadata": {},
   "outputs": [],
   "source": [
    "audio_path = \"sample1.flac\"\n",
    "\n",
    "res = predictor.predict(data=audio_path)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5da81fb9-b271-41a1-b296-393b769170fa",
   "metadata": {},
   "source": [
    "また、音声データをバイナリ形式で送信する方法もあります。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87974496-f383-4d7c-8dd5-95e47d17a8f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "audio_path = \"sample1.flac\"\n",
    "\n",
    "with open(audio_path, \"rb\") as data_file:\n",
    "    audio_data = data_file.read()\n",
    "    res = predictor.predict(data=audio_data)\n",
    "    print(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77c0e3c6-956f-4b3f-b3fc-d9e3c0014b8d",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 後片付け\n",
    "\n",
    "先ほど作成したモデルとデプロイしたエンドポイントを最後に削除します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae92bb31-2fe0-40e3-919a-123e8cadf7cc",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "predictor.delete_model()\n",
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ba91f4a-3e33-45cf-89af-5c7b876aac2c",
   "metadata": {},
   "source": [
    "## whisper のデプロイ\n",
    "\n",
    "ここからは、wav2vec ではなく OpenAI から出された音声書き起こしモデルである whisper を使う方法を紹介していきます。  \n",
    "現状、HuggingFace DLC は [v4.17](https://github.com/huggingface/transformers/releases/tag/v4.17.0) 対応となっているのですが whisper は [v4.23.1](https://github.com/huggingface/transformers/releases/tag/v4.23.1) 以降でないと使うことができません。  \n",
    "\n",
    "このためには `requirements.txt` で `transformers==4.23.1` を追加する必要があります。その手順を見ていきましょう。  \n",
    "[こちらの Notebook](https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb) で記載されている手順をベースに考えていきます。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7db9d3c9-b271-4a15-8b96-debdba065031",
   "metadata": {},
   "source": [
    "まずは、 `requirements.txt` を格納するためのフォルダを作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "350a13ce-00c5-4277-96eb-ee439615ccf9",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ecbe918-8e41-48f4-ad88-96f9d17aa9d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile code/requirements.txt\n",
    "transformers==4.23.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b38b01a-322d-4a37-a88c-20f24e24df36",
   "metadata": {},
   "source": [
    "`requirements.txt` を上書きした推論コードをアップロードします。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85b93d9a-e2e5-44aa-b19d-66d6249e1863",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "sess = sagemaker.Session()\n",
    "default_bucket = sess.default_bucket()\n",
    "# default_bucket = \"<BUCKET_NAME>\"  # もし自前で作成したバケットを使う場合はこちらのコメントを解除して値を指定してください\n",
    "print(default_bucket)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ec0ccab-0ada-4978-8cbe-1e252ede45dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "repository = \"openai/whisper-base\"\n",
    "model_id=repository.split(\"/\")[-1]\n",
    "s3_location=f\"s3://{default_bucket}/custom_inference/{model_id}/model.tar.gz\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fdb09f3-a7c0-44ea-bbe1-2d5f9edb71c0",
   "metadata": {},
   "source": [
    "モデルを `git clone` で hf.co/models からダウンロードします。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91960318-0a3b-4794-a4a9-cfb78df31c02",
   "metadata": {},
   "outputs": [],
   "source": [
    "!git lfs install\n",
    "!git clone https://huggingface.co/$repository"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9efef79c-1d34-4c22-9867-d91887b0029e",
   "metadata": {},
   "source": [
    "先ほど作成した `requirements.txt` をコピーします。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1179dc22-dd63-4582-b8cd-b326b8b860dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pwd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7de55f2d-e7af-40ab-a964-e7418f034978",
   "metadata": {},
   "outputs": [],
   "source": [
    "!cp -r code/ $model_id/code/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bdfd6e0-082b-4725-87e6-9a28daa284fa",
   "metadata": {},
   "source": [
    "`requirements.txt` やモデルアーティファクトを含める形で `model.tar.gz` を作成します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "754271b5-0b1c-4eda-a88c-644b70cb6bb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%cd $model_id\n",
    "!tar zcvf model.tar.gz *"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf3ce21d-e931-4a46-ab91-0a22a8f6dd27",
   "metadata": {},
   "source": [
    "`model.tar.gz` を s3 にアップロードします。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2365fb0-2f1f-4324-9694-e08cb1e01b5a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "!aws s3 cp model.tar.gz $s3_location"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7d7ae63-65e0-4d2f-a97e-438b2997b87f",
   "metadata": {
    "tags": []
   },
   "source": [
    "### モデルのデプロイ\n",
    "\n",
    "先ほどアップロードしたモデルを使って deploy をしてみます。  \n",
    "wav2vec での例との差分は、model_data の値を追加で指定しているところです。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8733a62-aeb6-4c7f-b66b-eceb57940f19",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from sagemaker.huggingface import HuggingFaceModel\n",
    "from sagemaker.serializers import DataSerializer\n",
    "import sagemaker\n",
    "\n",
    "\n",
    "#### <変更箇所> hub の値は使わない ####\n",
    "# Hub Model configuration. https://huggingface.co/models\n",
    "hub = {\n",
    "\t'HF_MODEL_ID':'openai/whisper-base',\n",
    "\t'HF_TASK':'automatic-speech-recognition'\n",
    "}\n",
    "####\n",
    "\n",
    "# create Hugging Face Model Class\n",
    "huggingface_model = HuggingFaceModel(\n",
    "    #### <変更箇所> model_data の値として s3_location を使う ####\n",
    "    model_data = s3_location,\n",
    "    ####\n",
    "\ttransformers_version='4.17.0',\n",
    "\tpytorch_version='1.10.2',\n",
    "\tpy_version='py38',\n",
    "\tenv=hub,\n",
    "\trole=role, \n",
    ")\n",
    "\n",
    "# deploy model to SageMaker Inference\n",
    "audio_serializer = DataSerializer(content_type='audio/x-audio')\n",
    "predictor = huggingface_model.deploy(\n",
    "\tinitial_instance_count=1, # number of instances\n",
    "\tinstance_type='ml.m5.xlarge', # ec2 instance type\n",
    "    serializer=audio_serializer\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15c49e8d-ba58-4bda-9b49-6b67537a64d2",
   "metadata": {},
   "source": [
    "### 推論リクエストを投げてみる\n",
    "元のフォルダに移動します"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b73b6a31-1c6f-479b-8697-033a057465dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "%cd .."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a3b8d13-960a-4be4-af61-ddf8504f278b",
   "metadata": {},
   "source": [
    "(wav2vec のサンプルを実行していなければ) 下記のコメントアウトを外して音声ファイルのダウンロードを行います"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cf293c2-f040-4f16-8a2c-06cb0a5908b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !wget https://cdn-media.huggingface.co/speech_samples/sample1.flac"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2329cc4d-2e42-404a-9e80-7d00aeb37db2",
   "metadata": {},
   "source": [
    "書き起こしがうまく動いていそうか確認します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b62c5ef-bde5-4e20-afc4-4886bbc04231",
   "metadata": {},
   "outputs": [],
   "source": [
    "audio_path = \"sample1.flac\"\n",
    "\n",
    "res = predictor.predict(data=audio_path)\n",
    "print(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2335dca3-1d45-470f-946c-8d3b4897c79d",
   "metadata": {},
   "source": [
    "### 後片付け\n",
    "\n",
    "先ほど作成したモデルとデプロイしたエンドポイントを最後に削除します。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b29e8430-44e6-4d1a-95a5-59c94c4bf546",
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_model()\n",
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48d24f87-df79-4edf-8c56-5d767c405656",
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 rm $s3_location"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "default:Python",
   "language": "python",
   "name": "conda-env-default-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}