{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Load Daily Data from AWS Data Exchange into S3 Bucket\n", "\n", "### Obtaining Data\n", "\n", "We obtain EOD stock data from AWS Data Exchange and export it to a S3 bucket. Then we format the data for our daily dataset. In this example we use the following data: https://aws.amazon.com/marketplace/pp/prodview-e2aizdzkos266\n", "\n", "### Output dataset \n", "\n", "- Contains 20 years of EOD data for one of the top 10 US companies\n", "- The data is saved into the specified S3 bucket as CSV.\n", "\n", "```\n", "hist_data_daily/{sym}.csv (columns: dt, sym,open,high,low,close,vol)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get S3 bucket\n", "s3bucket=!(aws s3 ls | grep algotrading- | awk '{print $3}')\n", "s3bucket=s3bucket[0]\n", "s3bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# symbol\n", "sym='JNJ'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# copy daily eod data to local\n", "! aws s3 cp s3://{s3bucket}/daily_adjusted_{sym}.csv ./" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv(\"daily_adjusted_\"+sym+\".csv\",infer_datetime_format=True, parse_dates=['timestamp'], index_col=['timestamp'])\n", "del df[\"split_coefficient\"]\n", "del df[\"dividend_amount\"]\n", "del df[\"adjusted_close\"]\n", "df.rename(inplace=True,columns={'volume':'vol'})\n", "df.index=df.index.rename('dt')\n", "df['sym']=sym\n", "df = df[['sym', 'open', 'high', 'low', 'close','vol']]\n", "df.sort_index(inplace=True)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.to_csv(sym+'.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 cp {sym}.csv s3://{s3bucket}/hist_data_daily/\n", "!rm daily_adjusted_{sym}.csv\n", "!rm {sym}.csv" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 2 }