{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load Daily Data from AWS Data Exchange into S3 Bucket\n",
    "\n",
    "### Obtaining Data\n",
    "\n",
    "We obtain EOD stock data from AWS Data Exchange and export it to a S3 bucket. Then we format the data for our daily dataset. In this example we use the following data: https://aws.amazon.com/marketplace/pp/prodview-e2aizdzkos266\n",
    "\n",
    "### Output dataset \n",
    "\n",
    "- Contains 20 years of EOD data for one of the top 10 US companies\n",
    "- The data is saved into the specified S3 bucket as CSV.\n",
    "\n",
    "```\n",
    "hist_data_daily/{sym}.csv (columns: dt, sym,open,high,low,close,vol)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get S3 bucket\n",
    "s3bucket=!(aws s3 ls | grep algotrading- | awk  '{print $3}')\n",
    "s3bucket=s3bucket[0]\n",
    "s3bucket"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# symbol\n",
    "sym='JNJ'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# copy daily eod data to local\n",
    "! aws s3 cp s3://{s3bucket}/daily_adjusted_{sym}.csv ./"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(\"daily_adjusted_\"+sym+\".csv\",infer_datetime_format=True, parse_dates=['timestamp'], index_col=['timestamp'])\n",
    "del df[\"split_coefficient\"]\n",
    "del df[\"dividend_amount\"]\n",
    "del df[\"adjusted_close\"]\n",
    "df.rename(inplace=True,columns={'volume':'vol'})\n",
    "df.index=df.index.rename('dt')\n",
    "df['sym']=sym\n",
    "df = df[['sym', 'open', 'high', 'low', 'close','vol']]\n",
    "df.sort_index(inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv(sym+'.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!aws s3 cp {sym}.csv s3://{s3bucket}/hist_data_daily/\n",
    "!rm daily_adjusted_{sym}.csv\n",
    "!rm {sym}.csv"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}