{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## step-0-train-modelでトレーニングしたモデルに推論リクエストを投げる"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "このノートブックを実行する時のヒント：   \n",
    "- このノートブックは大容量のRawデータを読み込むため、<span style=\"color: orange; \">メモリー8GB以上のインスタンス</span>で実行してください\n",
    "- KernelはPython3（Data Science）で動作確認をしています。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import boto3\n",
    "import pandas as pd\n",
    "import time\n",
    "from datetime import datetime\n",
    "import model_utils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_data_for_pred(target, sampling_rate):\n",
    "    previous_year, previous_month = model_utils.get_previous_year_month(target.year, target.month)\n",
    "    df_previous_month = model_utils.get_raw_data(previous_year, previous_month, sampling_rate)\n",
    "    df_current_month = model_utils.get_raw_data(target.year, target.month, sampling_rate)\n",
    "    df_data = pd.concat([df_previous_month, df_current_month])\n",
    "    del df_previous_month\n",
    "    del df_current_month\n",
    "\n",
    "    # Extract features\n",
    "    df_features = model_utils.extract_features(df_data)\n",
    "    df_features = model_utils.filter_current_month(df_features, target.year, target.month)\n",
    "    \n",
    "    return df_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 推論に利用するエンドポイントの名前を設定する\n",
    "# 同じモデルが複数の推論エンドポイントにデプロイされている場合に、テスト時間の短縮のためそれらを平行して利用することも可能。\n",
    "useable_endopoints = ['nyctaxi-xgboost-endpoint']\n",
    "# useable_endopoints = [\n",
    "#     'nyctaxi-xgboost-endpoint-0', \n",
    "#     'nyctaxi-xgboost-endpoint-1', \n",
    "#     'nyctaxi-xgboost-endpoint-2', \n",
    "#     'nyctaxi-xgboost-endpoint-3', \n",
    "# ]\n",
    "\n",
    "# 元データのサンプリングレート\n",
    "# モデル学習時と同じサンプリングレートを設定する\n",
    "sampling_rate = 20\n",
    "\n",
    "# 2020/03/03から2022/08/04までを7日間飛ばしで推論する設定\n",
    "# 1時間ごとに1日分の推論を実行し、hourly()のモニタリング設定であれば1時間に1日分のレポートが生成される\n",
    "start = '2020-03-03'\n",
    "end = '2022-08-04'\n",
    "day_step = 7\n",
    "freq=f'{day_step}D'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 指定した期間の推論を実行する\n",
    "モニタリング周期に合わせて推論を投入することで、1日分の推論をモニタリングジョブに与えることができる"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading data for 2020-03\n",
      "Predicting 2020-03-03 00:00:00 nyctaxi-xgboost-endpoint\n",
      "Sleep until next monitoring cycle. 3420 seconds\n",
      "\n"
     ]
    }
   ],
   "source": [
    "result_dir = 'prediction_results_model_quality'\n",
    "if not os.path.exists(result_dir):\n",
    "    os.makedirs(result_dir)\n",
    "\n",
    "\n",
    "currently_cached_month = 99\n",
    "results={}\n",
    "for i, target_date in enumerate(pd.date_range(start=start, end=end, freq=freq)):\n",
    "    # Load raw data and extract features for new month\n",
    "    if target_date.month != currently_cached_month:\n",
    "        print('Loading data for', target_date.strftime('%Y-%m'))\n",
    "        df_features = get_data_for_pred(target_date, sampling_rate)\n",
    "        currently_cached_month = target_date.month\n",
    "        \n",
    "    # Select next endpoint\n",
    "    endpoint = useable_endopoints[i%len(useable_endopoints)]\n",
    "    print('Predicting', target_date, endpoint)\n",
    "\n",
    "    # Exec prediction for the target date\n",
    "    df_pred = df_features[df_features.index == target_date].copy()\n",
    "    df_pred[['pred', 'inference_id']] = model_utils.exec_prediction(endpoint, df_pred)\n",
    "    df_pred.to_csv(f'{result_dir}/prediction-result-{target_date.strftime(\"%Y-%m-%d\")}.csv', index=False)\n",
    "    results[target_date.strftime('%Y-%m-%d')] = df_features\n",
    "\n",
    "    # Wait for next report cycle\n",
    "    if endpoint == useable_endopoints[-1]:\n",
    "        sleep_sec_until_5mins = (65 - pd.to_datetime(datetime.now()).minute) * 60\n",
    "        print('Sleep until next monitoring cycle. {} seconds\\n'.format(sleep_sec_until_5mins))\n",
    "        time.sleep(sleep_sec_until_5mins)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "instance_type": "ml.t3.medium",
  "kernelspec": {
   "display_name": "Python 3 (Data Science)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:ap-northeast-1:102112518831:image/datascience-1.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}