{"cells": [{"metadata": {"collapsed": true, "id": "06c88850-7b8e-42d1-a6ad-ae2360698688"}, "cell_type": "markdown", "source": "\n\n

IBM-AWS Immersion Day Lab 4

Notebook 2 : Predict future COVID-19 cases for Wallonia region with Long Short-Term Memory (LSTM) Model

"}, {"metadata": {"id": "499cfa88fe8141488b0726ac89c783c7"}, "cell_type": "markdown", "source": "In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases."}, {"metadata": {"id": "b63f82d05b984223974a40be6249f092"}, "cell_type": "markdown", "source": "### Import required libraries"}, {"metadata": {"id": "13ad35bac1f84a0785d69d1a7753b4fb"}, "cell_type": "code", "source": "import boto3\nimport numpy as np \nimport pandas as pd \nfrom keras.layers.core import Dense, Dropout\nfrom keras.layers.recurrent import LSTM\nfrom keras.models import Sequential\nfrom tensorflow.keras.optimizers import Adam\nimport math, time\nfrom sklearn.preprocessing import MinMaxScaler\nimport matplotlib.pyplot as plt\nfrom numpy import newaxis\nfrom keras.callbacks import EarlyStopping\nimport tensorflow\nfrom io import StringIO\nimport datetime\nimport io\nimport itertools\nfrom project_lib import Project\nproject = Project.access()\n%matplotlib inline", "execution_count": 1, "outputs": []}, {"metadata": {"id": "54ef6c9c375945638d46540986ef68a5"}, "cell_type": "markdown", "source": "### Load the dataset from Amazon S3 into pandas dataframe\n>Note: you can add the comment `# @hidden_cell` in the below code cell. Cloud Pak for Data will automatically hide the cell before sharing it."}, {"metadata": {"id": "10f3b48312fd423b804adb94be85c6f0"}, "cell_type": "code", "source": "", "execution_count": 2, "outputs": [{"data": {"text/plain": " DATE REGION Total_cases\n0 15/03/20 Wallonia 383\n1 16/03/20 Wallonia 568\n2 17/03/20 Wallonia 654\n3 18/03/20 Wallonia 925\n4 19/03/20 Wallonia 1245\n5 20/03/20 Wallonia 1468\n6 21/03/20 Wallonia 1377\n7 22/03/20 Wallonia 1464\n8 23/03/20 Wallonia 1758\n9 24/03/20 Wallonia 1777", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DATEREGIONTotal_cases
015/03/20Wallonia383
116/03/20Wallonia568
217/03/20Wallonia654
318/03/20Wallonia925
419/03/20Wallonia1245
520/03/20Wallonia1468
621/03/20Wallonia1377
722/03/20Wallonia1464
823/03/20Wallonia1758
924/03/20Wallonia1777
\n
"}, "metadata": {}, "execution_count": 2, "output_type": "execute_result"}]}, {"metadata": {"id": "d6175c18974a42f58023affb240c02da"}, "cell_type": "code", "source": "regionData = data_df_1", "execution_count": 3, "outputs": []}, {"metadata": {"id": "64796fd153dc41c9899feec32458e8ec"}, "cell_type": "markdown", "source": "#### Drop REGION column"}, {"metadata": {"id": "d20ddda88d6e413b8fa734a7fcd64e5c"}, "cell_type": "code", "source": "data_df_1 = data_df_1.drop('REGION', axis=1)", "execution_count": 4, "outputs": []}, {"metadata": {"id": "8cc3027adbf047858764df215dfc1ca1"}, "cell_type": "markdown", "source": "#### Drop the index column and set DATE column as index column"}, {"metadata": {"id": "882d547aee92491288ce7b0fbe1361c6"}, "cell_type": "code", "source": "data_df_1.set_index('DATE', inplace=True)", "execution_count": 5, "outputs": []}, {"metadata": {"id": "26aa6afc1d044b7c8d9517a591abef3a"}, "cell_type": "markdown", "source": "### Fix random seed for reproducibility"}, {"metadata": {"id": "58584bc38f44413c8759f6adb67b1951"}, "cell_type": "code", "source": "tensorflow.random.set_seed(1309)", "execution_count": 6, "outputs": []}, {"metadata": {"id": "d58d669e3b0e4f9b9799a3754df9937f"}, "cell_type": "markdown", "source": "### Rename the dataframe and convert the datatype"}, {"metadata": {"id": "d823cfe088ed436494add605f7f9ffef"}, "cell_type": "code", "source": "series = data_df_1\nseries = series.astype(float)", "execution_count": 7, "outputs": []}, {"metadata": {"id": "36c46721995546e69cb0097defe800d2"}, "cell_type": "markdown", "source": "### Plot the data"}, {"metadata": {"id": "986f5539ec4641e0af2abd0b9971c60b"}, "cell_type": "code", "source": "plt.figure(figsize=(20,6))\nplt.plot(series.values)\nplt.show()", "execution_count": 8, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}]}, {"metadata": {"id": "d5e5c9cd3c9b44f09e375748cb9023d7"}, "cell_type": "markdown", "source": "### Normalize the data"}, {"metadata": {"id": "828692bc0a9e436184e07d24f1f898bb"}, "cell_type": "code", "source": "series = series.values\nscaler = MinMaxScaler(feature_range=(0, 1))\nseries = scaler.fit_transform(series)", "execution_count": 9, "outputs": []}, {"metadata": {"id": "19c195736c9a414e87966beba99c0388"}, "cell_type": "markdown", "source": "### Train Test Split 70:30 Ratio"}, {"metadata": {"id": "c971f04cf8d6460c893af685cb13fed5"}, "cell_type": "code", "source": "train_size = int(len(series) * 0.70)\ntest_size = len(series) - train_size\ntrain, test = series[0:train_size,:], series[train_size:len(series),:]\nprint(len(train), len(test))", "execution_count": 10, "outputs": [{"name": "stdout", "text": "518 223\n", "output_type": "stream"}]}, {"metadata": {"id": "e9eb1ee7f638470ba9b0419baab37e0f"}, "cell_type": "markdown", "source": "### Helper function to generate the dataset with input(X) & output(Y) variables"}, {"metadata": {"id": "01f605f6c04644dfa6d75ffe782b4191"}, "cell_type": "code", "source": "def create_dataset(dataset, look_back=1):\n dataX, dataY = [], []\n for i in range(len(dataset)-look_back-1):\n a = dataset[i:(i+look_back), 0]\n dataX.append(a)\n dataY.append(dataset[i + look_back, 0])\n return np.array(dataX), np.array(dataY)", "execution_count": 11, "outputs": []}, {"metadata": {"id": "35016902524f47c4a3901c50d95ee145"}, "cell_type": "markdown", "source": "### Create a dataset with a look back period of 15 observations\nThis is where we convert the time series problem into a regression problem"}, {"metadata": {"id": "61dc59629f84427d8c90af7a08de916f"}, "cell_type": "code", "source": "look_back = 30\ntrainX, trainY = create_dataset(train, look_back)\ntestX, testY = create_dataset(test, look_back)", "execution_count": 12, "outputs": []}, {"metadata": {"id": "0a93a0bd8a0f40d880362267ba44effe"}, "cell_type": "markdown", "source": "### Review the shape of datasets"}, {"metadata": {"id": "45b7737b846c41488d7e4d03598aea94"}, "cell_type": "code", "source": "trainX.shape", "execution_count": 13, "outputs": [{"data": {"text/plain": "(487, 30)"}, "metadata": {}, "execution_count": 13, "output_type": "execute_result"}]}, {"metadata": {"id": "1a8f882a4dc64cac8193c8a4b6a9be94"}, "cell_type": "code", "source": "testX.shape", "execution_count": 14, "outputs": [{"data": {"text/plain": "(192, 30)"}, "metadata": {}, "execution_count": 14, "output_type": "execute_result"}]}, {"metadata": {"id": "8c9eeedcbe79482eb01d2ab17dccd676"}, "cell_type": "markdown", "source": "### Reshape the data to 3D\nThe LSTM model requires the input data to be three dimensional"}, {"metadata": {"id": "d1e126a09526463e868011c9b2547ff6"}, "cell_type": "code", "source": "trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))\ntestX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))", "execution_count": 15, "outputs": []}, {"metadata": {"id": "2413b15abb4941df8d2c1e48529055b1"}, "cell_type": "code", "source": "trainX.shape", "execution_count": 16, "outputs": [{"data": {"text/plain": "(487, 30, 1)"}, "metadata": {}, "execution_count": 16, "output_type": "execute_result"}]}, {"metadata": {"id": "23c134fbb71544579451910acb28af8f"}, "cell_type": "markdown", "source": "### Define the LSTM model\nActivation function will activate the neurons for the learning. Rectified linear unit (ReLu) is one of the most popular activations because the output does not go beyond 0.\n\nUnits will be the number of neurons in the input & hidden layers.\n\nStateful is where we define whether the previous information has to be remembered or not.\n\nDropout is where we omit random neurons for each layer as per the value (0 to 1). In this case we omit 20% of the neurons.\n\nOptimiser is where the weights are back propagated through the network to enhance the learnings closer to the desired outcome. Adam optimiser is an efficient method for enhanced accuracy."}, {"metadata": {"id": "e4abd31fb46b44c1a8c634699e148022"}, "cell_type": "markdown", "source": "Hyperparameters for the current model:\n- **train_test_split:** 0.70\n- **lookback:** 30\n- **hidden_layers:** 2\n- **units:** 60, 100\n- **dropouts:** 0.15, 0.15\n- **optimizer:** adam\n- **learning_rate:** 0.001 (default)\n- **epochs:** 25\n- **batch_size:** 32"}, {"metadata": {"id": "71067fb37f6641268bf84af22eeac41d"}, "cell_type": "code", "source": "print('LSTM Model Summary')\nmodel = Sequential()\nmodel.add(LSTM(input_shape=(trainX.shape[1], trainX.shape[2]), kernel_initializer=\"uniform\", return_sequences=True, stateful=False, units=60))\nmodel.add(Dropout(0.15))\nmodel.add(LSTM(100, kernel_initializer=\"uniform\", activation='relu',return_sequences=False))\nmodel.add(Dropout(0.15))\nmodel.add(Dense(32,kernel_initializer=\"uniform\",activation='relu'))\nmodel.add(Dense(1, activation='linear'))\n# optimizer = Adam(learning_rate=0.0006)\n# model.compile(loss=\"mean_squared_error\", optimizer=optimizer)\nmodel.compile(loss=\"mean_squared_error\", optimizer='adam')\nmodel.summary()", "execution_count": 17, "outputs": [{"name": "stdout", "text": "LSTM Model Summary\nModel: \"sequential\"\n_________________________________________________________________\n Layer (type) Output Shape Param # \n=================================================================\n lstm (LSTM) (None, 30, 60) 14880 \n \n dropout (Dropout) (None, 30, 60) 0 \n \n lstm_1 (LSTM) (None, 100) 64400 \n \n dropout_1 (Dropout) (None, 100) 0 \n \n dense (Dense) (None, 32) 3232 \n \n dense_1 (Dense) (None, 1) 33 \n \n=================================================================\nTotal params: 82,545\nTrainable params: 82,545\nNon-trainable params: 0\n_________________________________________________________________\n", "output_type": "stream"}]}, {"metadata": {"id": "051715d0c1e84ab2ae582f5905298219"}, "cell_type": "markdown", "source": "### Parameter Calculation\nparams = 4 * (size_of_input + 1 * size_of_output) + 4 * size_of_output^2\n\n"}, {"metadata": {"id": "c815bcf26f494d769654005be9006edc"}, "cell_type": "markdown", "source": "### Optimize computation time using early stopping\n\nWe monitor the accuracy of validation loss ('val_loss') and end the training if there's no improvement in the accuracy after five iterations ('patience=5')."}, {"metadata": {"id": "d90bff143b644a4e93d8e899d1c5988d"}, "cell_type": "code", "source": "early_stopping=EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')", "execution_count": 18, "outputs": []}, {"metadata": {"id": "a04dd4e4a37140d78e8439207b9ce463"}, "cell_type": "markdown", "source": "### Fitting the model for training data"}, {"metadata": {"id": "47db82e646c54e86ad97475695c55622"}, "cell_type": "code", "source": "start = time.time()\nhistory = model.fit(trainX, trainY, batch_size=32, epochs=25, verbose=1, shuffle=False, validation_split=0.10, callbacks=[early_stopping])\nprint(\"> Compilation Time : \", time.time() - start)", "execution_count": 19, "outputs": [{"name": "stdout", "text": "Epoch 1/25\n14/14 [==============================] - 6s 138ms/step - loss: 0.0142 - val_loss: 0.0010\nEpoch 2/25\n14/14 [==============================] - 1s 97ms/step - loss: 0.0111 - val_loss: 0.0045\nEpoch 3/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0110 - val_loss: 0.0035\nEpoch 4/25\n14/14 [==============================] - 1s 101ms/step - loss: 0.0103 - val_loss: 0.0031\nEpoch 5/25\n14/14 [==============================] - 1s 102ms/step - loss: 0.0089 - val_loss: 0.0016\nEpoch 6/25\n14/14 [==============================] - 1s 101ms/step - loss: 0.0073 - val_loss: 7.0599e-04\nEpoch 7/25\n14/14 [==============================] - 1s 97ms/step - loss: 0.0081 - val_loss: 0.0029\nEpoch 8/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0061 - val_loss: 4.8584e-04\nEpoch 9/25\n14/14 [==============================] - 1s 99ms/step - loss: 0.0052 - val_loss: 2.3361e-04\nEpoch 10/25\n14/14 [==============================] - 1s 101ms/step - loss: 0.0056 - val_loss: 0.0028\nEpoch 11/25\n14/14 [==============================] - 1s 106ms/step - loss: 0.0042 - val_loss: 1.3609e-04\nEpoch 12/25\n14/14 [==============================] - 1s 100ms/step - loss: 0.0039 - val_loss: 1.1586e-04\nEpoch 13/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0044 - val_loss: 0.0018\nEpoch 14/25\n14/14 [==============================] - 1s 99ms/step - loss: 0.0033 - val_loss: 4.8268e-05\nEpoch 15/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0027 - val_loss: 4.9645e-05\nEpoch 16/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0030 - val_loss: 7.5402e-04\nEpoch 17/25\n14/14 [==============================] - 1s 104ms/step - loss: 0.0022 - val_loss: 4.5085e-05\nEpoch 18/25\n14/14 [==============================] - 2s 110ms/step - loss: 0.0021 - val_loss: 4.2691e-05\nEpoch 19/25\n14/14 [==============================] - 2s 109ms/step - loss: 0.0022 - val_loss: 1.8926e-04\nEpoch 20/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0019 - val_loss: 4.6648e-05\nEpoch 21/25\n14/14 [==============================] - 1s 101ms/step - loss: 0.0020 - val_loss: 4.1863e-05\nEpoch 22/25\n14/14 [==============================] - 1s 96ms/step - loss: 0.0018 - val_loss: 4.3175e-05\nEpoch 23/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0015 - val_loss: 4.0031e-05\nEpoch 24/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0019 - val_loss: 8.4009e-05\nEpoch 25/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0017 - val_loss: 4.2816e-05\n> Compilation Time : 39.40610432624817\n", "output_type": "stream"}]}, {"metadata": {"id": "cf16354ada204b128efabce30fab4f68"}, "cell_type": "markdown", "source": "Model run time is ~ 300 seconds and has produced almost similar values for training & validation loss which is great."}, {"metadata": {"id": "6850758269a34f74a5e5439d7dd72764"}, "cell_type": "markdown", "source": "### Create a function to calculate accuracy\nWe will be using 'Mean Squared Error' & 'Root Mean Squared Error' functions to calculate accuracy"}, {"metadata": {"id": "3c45d320fdee4f4aafe109028749310c"}, "cell_type": "code", "source": "def model_score(model, trainX, trainY, testX, testY):\n trainScore = model.evaluate(trainX, trainY, batch_size=32, verbose=0)\n print('Train Score: %.5f MSE (%.2f RMSE)' % (trainScore, math.sqrt(trainScore)))\n print('Train Accuracy: %.2f %%' % (100 - math.sqrt(trainScore)*100))\n\n testScore = model.evaluate(testX, testY, batch_size=32, verbose=0)\n print('Test Score: %.5f MSE (%.2f RMSE)' % (testScore, math.sqrt(testScore)))\n print('Test Accuracy: %.2f %%' % (100 - math.sqrt(testScore)*100))\n return trainScore, testScore", "execution_count": 20, "outputs": []}, {"metadata": {"id": "e3213a83f67b4377976a9af76e99cba9"}, "cell_type": "markdown", "source": "### Check the Accuracy of the model"}, {"metadata": {"id": "9253ca4e259542f98cd9fd0c0b581589"}, "cell_type": "code", "source": "model_score(model, trainX, trainY, testX, testY)", "execution_count": 21, "outputs": [{"name": "stdout", "text": "Train Score: 0.00134 MSE (0.04 RMSE)\nTrain Accuracy: 96.34 %\nTest Score: 0.01155 MSE (0.11 RMSE)\nTest Accuracy: 89.25 %\n", "output_type": "stream"}, {"data": {"text/plain": "(0.0013373465044423938, 0.011548556387424469)"}, "metadata": {}, "execution_count": 21, "output_type": "execute_result"}]}, {"metadata": {"id": "1c629badaaee4d6c94ea839691ed90aa"}, "cell_type": "markdown", "source": "We can observe that the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.\n\nThe model accuracy is > 94% as per the values of Mean Squared Error (MSE)"}, {"metadata": {"id": "eccf42416a8a45968fc738bfbd08932e"}, "cell_type": "markdown", "source": "### Review the learning of training & validation loss (error evaluation)"}, {"metadata": {"id": "c1a7b07f166d432f897df3f032868a03"}, "cell_type": "code", "source": "'''Review the learning'''\n\nplt.plot(history.history['loss']) # Train\nplt.plot(history.history['val_loss']) # Test\nplt.show()", "execution_count": 22, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}]}, {"metadata": {"id": "d3efe96f75d14d4c8957fbe61d5dbeaf"}, "cell_type": "markdown", "source": "There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem."}, {"metadata": {"id": "e6f251c98b594f90b233091181d38be9"}, "cell_type": "markdown", "source": "### Get the configuration of the model\nThis will give us an idea about all the parameters available and which ones have been choosen."}, {"metadata": {"id": "7ce92d3acb1c4a018fd9a7fbd3eee2f2"}, "cell_type": "code", "source": "model.get_config()", "execution_count": 23, "outputs": [{"data": {"text/plain": "{'name': 'sequential',\n 'layers': [{'class_name': 'InputLayer',\n 'config': {'batch_input_shape': (None, 30, 1),\n 'dtype': 'float32',\n 'sparse': False,\n 'ragged': False,\n 'name': 'lstm_input'}},\n {'class_name': 'LSTM',\n 'config': {'name': 'lstm',\n 'trainable': True,\n 'batch_input_shape': (None, 30, 1),\n 'dtype': 'float32',\n 'return_sequences': True,\n 'return_state': False,\n 'go_backwards': False,\n 'stateful': False,\n 'unroll': False,\n 'time_major': False,\n 'units': 60,\n 'activation': 'tanh',\n 'recurrent_activation': 'hard_sigmoid',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'recurrent_initializer': {'class_name': 'Orthogonal',\n 'config': {'gain': 1.0, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'unit_forget_bias': True,\n 'kernel_regularizer': None,\n 'recurrent_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'recurrent_constraint': None,\n 'bias_constraint': None,\n 'dropout': 0.0,\n 'recurrent_dropout': 0.0,\n 'implementation': 1}},\n {'class_name': 'Dropout',\n 'config': {'name': 'dropout',\n 'trainable': True,\n 'dtype': 'float32',\n 'rate': 0.15,\n 'noise_shape': None,\n 'seed': None}},\n {'class_name': 'LSTM',\n 'config': {'name': 'lstm_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'return_sequences': False,\n 'return_state': False,\n 'go_backwards': False,\n 'stateful': False,\n 'unroll': False,\n 'time_major': False,\n 'units': 100,\n 'activation': 'relu',\n 'recurrent_activation': 'hard_sigmoid',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'recurrent_initializer': {'class_name': 'Orthogonal',\n 'config': {'gain': 1.0, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'unit_forget_bias': True,\n 'kernel_regularizer': None,\n 'recurrent_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'recurrent_constraint': None,\n 'bias_constraint': None,\n 'dropout': 0.0,\n 'recurrent_dropout': 0.0,\n 'implementation': 1}},\n {'class_name': 'Dropout',\n 'config': {'name': 'dropout_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'rate': 0.15,\n 'noise_shape': None,\n 'seed': None}},\n {'class_name': 'Dense',\n 'config': {'name': 'dense',\n 'trainable': True,\n 'dtype': 'float32',\n 'units': 32,\n 'activation': 'relu',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'kernel_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'bias_constraint': None}},\n {'class_name': 'Dense',\n 'config': {'name': 'dense_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'units': 1,\n 'activation': 'linear',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'GlorotUniform',\n 'config': {'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'kernel_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'bias_constraint': None}}]}"}, "metadata": {}, "execution_count": 23, "output_type": "execute_result"}]}, {"metadata": {"id": "90c16b9e423c47ce964c97671ad947c6"}, "cell_type": "markdown", "source": "### Create a function to plot predicted vs actual values "}, {"metadata": {"id": "23c7df9f-6951-46c1-b5fc-1ce017d616a0"}, "cell_type": "code", "source": "def plot_the_results(predicted_data, true_data, prediction_len):\n fig = plt.figure(facecolor='white', figsize=(16,8))\n ax = fig.add_subplot(111)\n ax.plot(true_data, label='True Data')\n for i, data in enumerate(predicted_data):\n padding = [None for p in range(i * prediction_len)]\n plt.plot(padding + data, label='Prediction')\n plt.plot(padding + data, 'b^')\n plt.show()", "execution_count": 24, "outputs": []}, {"metadata": {"id": "3d9a954976f742d18b5579a009a72d92"}, "cell_type": "markdown", "source": "### Create a function to predict future values"}, {"metadata": {"id": "e69c634bc5764c5d836cb69e84e194d3"}, "cell_type": "code", "source": "def predict_the_sequences(model, data, window_size, prediction_len):\n prediction_seqs = []\n for i in range(int(len(data)/prediction_len)):\n curr_frame = data[i*prediction_len]\n predicted = []\n for j in range(prediction_len):\n predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])\n curr_frame = curr_frame[1:]\n curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)\n prediction_seqs.append(predicted)\n return prediction_seqs", "execution_count": 25, "outputs": []}, {"metadata": {"id": "3f69fe49e53a4d828ea6a23ed34c417d"}, "cell_type": "markdown", "source": "### Predict future values & plot the results\n In this case, we are predicting the current values.\n If we need to predict t+1 then the prediction_len parameter has to be changed to 2\n and if we need t+2 then prediction_len would be 3"}, {"metadata": {"id": "b94874c3be1241ed8ee0743f30dc53f9"}, "cell_type": "code", "source": "predictions = predict_the_sequences(model, testX, look_back, 1)\n\nplot_the_results(predictions, testY, 1)", "execution_count": 26, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {}, "output_type": "display_data"}]}, {"metadata": {"id": "f20574d659d248acb8e22d7c48aa4766"}, "cell_type": "markdown", "source": "We can observe that the model is able to catch the pattern in the data. This can be further improved by changing the hyper parameters however we are demonstrating the methodology."}, {"metadata": {"id": "9b33aead24da44808d524565de55eaee"}, "cell_type": "markdown", "source": "### Predictions for next 7 days"}, {"metadata": {"id": "63c0f4630da94826bd8a12670d7e3462"}, "cell_type": "code", "source": "predictions7 = predict_the_sequences(model, testX, look_back, 7)", "execution_count": 27, "outputs": []}, {"metadata": {"id": "9bec4249-6b91-4c19-a840-f11f80fdf20f"}, "cell_type": "markdown", "source": "### Denormalize the predicted values and save the data\nDenormalize & Convert the predicted output to a dataframe & print the results"}, {"metadata": {"id": "9b5f9ee0e5d74bd09bc47644a247ddfc"}, "cell_type": "code", "source": "predictionValues = scaler.inverse_transform(predictions7)\nresults = pd.DataFrame(np.round(predictionValues[-1:]))", "execution_count": 28, "outputs": []}, {"metadata": {"id": "a759ad93ab5845de928683260dd1152a"}, "cell_type": "code", "source": "results_list = results.values.tolist()", "execution_count": 29, "outputs": []}, {"metadata": {"id": "e87293121b334aa2918637bdb3ca572c"}, "cell_type": "code", "source": "results_list = [int(i) for i in results_list[0]]", "execution_count": 30, "outputs": []}, {"metadata": {"id": "28b2a5d9bef341f6b2a01380e1ac3f7b"}, "cell_type": "code", "source": "last_date = data_df_1.iloc[-1].name\nlastDate = datetime.datetime(int(last_date.split('/')[2]), int(last_date.split('/')[1]), int(last_date.split('/')[0]))\nnext_date = lastDate + datetime.timedelta(days=1)\nnext_date = next_date.strftime('%d/%m/%y')", "execution_count": 31, "outputs": []}, {"metadata": {"id": "3ec0523b96834fdf8a6e7eaef77a1e5f"}, "cell_type": "code", "source": "next_7_days = pd.date_range(start=next_date, periods=7, freq='D')\nnext_7_days_df = pd.DataFrame({'DATE': next_7_days.tolist(), 'Prediction': results_list})", "execution_count": 32, "outputs": []}, {"metadata": {"id": "a2314aa4a9cb47428fd391ecfa6206cc"}, "cell_type": "code", "source": "next_7_days_df['DATE'] = next_7_days_df['DATE'].dt.strftime('%d/%m/%y')", "execution_count": 33, "outputs": []}, {"metadata": {"id": "270406a57eba418d87283de44d9ca160"}, "cell_type": "code", "source": "next_7_days_df.set_index('DATE', inplace=True)", "execution_count": 34, "outputs": []}, {"metadata": {"id": "c4ff1899b636481e9753ae3d7569c58c"}, "cell_type": "code", "source": "finalDf = pd.DataFrame(data_df_1.tail(7))", "execution_count": 35, "outputs": []}, {"metadata": {"id": "144c727f1c8f4d5683f719895cd3e8a4", "scrolled": true}, "cell_type": "code", "source": "next_7_days_df.reset_index()", "execution_count": 36, "outputs": [{"data": {"text/plain": " DATE Prediction\n0 26/03/22 1859\n1 27/03/22 1964\n2 28/03/22 1974\n3 29/03/22 1994\n4 30/03/22 2035\n5 31/03/22 2099\n6 01/04/22 2176", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DATEPrediction
026/03/221859
127/03/221964
228/03/221974
329/03/221994
430/03/222035
531/03/222099
601/04/222176
\n
"}, "metadata": {}, "execution_count": 36, "output_type": "execute_result"}]}, {"metadata": {"id": "61c48b0533764d0d9772e25f133da03f", "scrolled": true}, "cell_type": "code", "source": "finalDf.reset_index()", "execution_count": 37, "outputs": [{"data": {"text/plain": " DATE Total_cases\n0 19/03/22 1222\n1 20/03/22 727\n2 21/03/22 2948\n3 22/03/22 2782\n4 23/03/22 2750\n5 24/03/22 1371\n6 25/03/22 33", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DATETotal_cases
019/03/221222
120/03/22727
221/03/222948
322/03/222782
423/03/222750
524/03/221371
625/03/2233
\n
"}, "metadata": {}, "execution_count": 37, "output_type": "execute_result"}]}, {"metadata": {"id": "d034f2a42e6a4619a4988ac18f38efaf"}, "cell_type": "code", "source": "next_7_days_df = next_7_days_df.astype({'Prediction': int})", "execution_count": 38, "outputs": []}, {"metadata": {"id": "f1b700b4cb4e49c3a4a09799d550c9c7"}, "cell_type": "code", "source": "finalDf = finalDf.astype({'Total_cases': int})", "execution_count": 39, "outputs": []}, {"metadata": {"id": "b9ddab06655d448681a3378f8ef30f1e"}, "cell_type": "code", "source": "next_7_days_df.to_csv('_buffer_next7.csv')\nfinalDf.to_csv('_buffer_actual7.csv')\nregionData.to_csv('_buffer_region.csv')", "execution_count": 40, "outputs": []}, {"metadata": {"id": "26a0f751e89c43b69c39802f8ecef8d0"}, "cell_type": "code", "source": "next_7_days_df = pd.read_csv('_buffer_next7.csv')\nfinalDf = pd.read_csv('_buffer_actual7.csv')\nregionData = pd.read_csv('_buffer_region.csv')", "execution_count": 41, "outputs": []}, {"metadata": {"id": "0c8d1f4b9ee1489189460e8111734a6b"}, "cell_type": "code", "source": "next_7_days_df['DATE'] = pd.to_datetime(pd.Series(next_7_days_df['DATE']), format=\"%d/%m/%y\")\nfinalDf['DATE'] = pd.to_datetime(pd.Series(finalDf['DATE']), format=\"%d/%m/%y\")\nregionData['DATE'] = pd.to_datetime(pd.Series(regionData['DATE']), format=\"%d/%m/%y\")", "execution_count": 42, "outputs": []}, {"metadata": {"id": "e15614f0129b430986e20b87d43d68f7"}, "cell_type": "code", "source": "finalDf = pd.concat([finalDf,next_7_days_df], axis=1)", "execution_count": 43, "outputs": []}, {"metadata": {"id": "22dff2d1fa67457591d140ddb2fae6f2"}, "cell_type": "markdown", "source": "### Store the prediction data to Amazon S3 and Cloud Pak for Data Project Assets"}, {"metadata": {"id": "3dc78ebed0d54c5c8e0ee87969638e7c"}, "cell_type": "code", "source": "filename = 'wallonia-next7Prediction.csv'", "execution_count": 44, "outputs": []}, {"metadata": {"id": "eb3abdcef497430e86715377d483d949"}, "cell_type": "code", "source": "originalData = \"Wallonia.csv\"", "execution_count": 45, "outputs": []}, {"metadata": {"id": "9893203e8878410d8c3d4a6c3e2754ed"}, "cell_type": "code", "source": "from ibm_watson_studio_lib import access_project_or_space\nwslib = access_project_or_space()\nAWS_S3_credentials = wslib.get_connection(AWS_S3_data_request['connection_name'])", "execution_count": 46, "outputs": []}, {"metadata": {"id": "f366caf305e240e98729f38e606a2989"}, "cell_type": "code", "source": "s3 = boto3.resource(\n service_name = \"s3\",\n region_name = AWS_S3_credentials['region'],\n aws_access_key_id = AWS_S3_credentials['access_key'],\n aws_secret_access_key = AWS_S3_credentials['secret_key']\n)", "execution_count": 47, "outputs": []}, {"metadata": {"id": "6cd04b37703348ae8514ee4352327cfd"}, "cell_type": "code", "source": "bucket = AWS_S3_credentials['bucket']\ncsv_buffer = StringIO()\nnext_7_days_df.to_csv(csv_buffer, index=False)\ns3.Object(bucket, \"model_output/\"+filename).put(Body=csv_buffer.getvalue())\ncsv_buffer = StringIO()", "execution_count": 48, "outputs": []}, {"metadata": {"id": "653bef89e5944897a8e89670734a03b5"}, "cell_type": "code", "source": "next_7_days_df.to_csv(filename, index=False)\n\nwith open(filename, 'rb') as z:\n data = io.BytesIO(z.read())\n project.save_data(filename, data, set_project_asset=True, overwrite=True)", "execution_count": 49, "outputs": []}, {"metadata": {"id": "50bb81b39b1b4010b42a7c04a69c1100"}, "cell_type": "code", "source": "regionData.to_csv(originalData, index=False)\n\nwith open(originalData, 'rb') as z:\n data = io.BytesIO(z.read())\n project.save_data(originalData, data, set_project_asset=True, overwrite=True)", "execution_count": 50, "outputs": []}, {"metadata": {"id": "e72ae7b8003b4e0e97cc7012c7971fdf"}, "cell_type": "markdown", "source": "### Store Model Loss Data to S3 and Assets"}, {"metadata": {"id": "43e0f53d3d0f47bda24e85d3b09207e0"}, "cell_type": "code", "source": "errorevaluation = pd.DataFrame({'index': [int(i) for i in range(len(history.history['loss']))], 'loss': history.history['loss'], 'val_loss': history.history['val_loss']})", "execution_count": 51, "outputs": []}, {"metadata": {"id": "4dfc1090967942df901ecced937dcc20"}, "cell_type": "code", "source": "errorFilename = 'wallonia-errorEvaluation.csv'", "execution_count": 52, "outputs": []}, {"metadata": {"id": "0bc8127ca5e244608d0aa1f4a8ccb8b5"}, "cell_type": "code", "source": "csv_buffer = StringIO()\nerrorevaluation.to_csv(csv_buffer, index=False)\ns3.Object(bucket, \"model_output/\"+errorFilename).put(Body=csv_buffer.getvalue())", "execution_count": null, "outputs": []}, {"metadata": {"id": "fb1c15e1c56d479ab91adba5d6b54e3d"}, "cell_type": "code", "source": "errorevaluation.to_csv(errorFilename, index=False)\n\nwith open(errorFilename, 'rb') as z:\n data = io.BytesIO(z.read())\n project.save_data(errorFilename, data, set_project_asset=True, overwrite=True)", "execution_count": 54, "outputs": []}, {"metadata": {"id": "1afdebecb5ef46b28d545e4979eb472c"}, "cell_type": "markdown", "source": "### Store Actual vs Predicted Data to S3 and Assets"}, {"metadata": {"id": "54206734ea0c44ada75ee4d7694be9d5"}, "cell_type": "code", "source": "actualVsPredicted = pd.DataFrame({'index': [int(i) for i in range(len(predictions))], 'actual': testY, 'prediction': list(itertools.chain.from_iterable(predictions))})", "execution_count": 55, "outputs": []}, {"metadata": {"id": "08c49c7512e74069879864a83d9aeb01"}, "cell_type": "code", "source": "actualVsPredictedFilename = 'wallonia-actualVsPredicted.csv'", "execution_count": 56, "outputs": []}, {"metadata": {"id": "a2315f06a55f4fd3846f43240a8e995b"}, "cell_type": "code", "source": "csv_buffer = StringIO()\nactualVsPredicted.to_csv(csv_buffer, index=False)\ns3.Object(bucket, \"model_output/\"+actualVsPredictedFilename).put(Body=csv_buffer.getvalue())", "execution_count": null, "outputs": []}, {"metadata": {"id": "df4e1ddeff384f498e1de9330d239ae4"}, "cell_type": "code", "source": "actualVsPredicted.to_csv(actualVsPredictedFilename, index=False)\n\nwith open(actualVsPredictedFilename, 'rb') as z:\n data = io.BytesIO(z.read())\n project.save_data(actualVsPredictedFilename, data, set_project_asset=True, overwrite=True)", "execution_count": 58, "outputs": []}, {"metadata": {"id": "ca8848ca75004e108771d8c617a73a7f"}, "cell_type": "markdown", "source": "

Want to learn more?

\n\nThe AutoAI graphical tool in Watson Studio analyzes your data and discovers data transformations, algorithms, and parameter settings that work best for your predictive modeling problem. AutoAI displays the results as model candidate pipelines ranked on a leaderboard for you to choose from.\n\nAlso, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio\n\n

Thanks for completing this Lab!

\n\n

Author: Manoj Jahgirdar & Sharath Kumar RK

\n\n"}], "metadata": {"kernelspec": {"name": "python3", "display_name": "Python 3.9", "language": "python"}, "language_info": {"name": "python", "version": "3.9.7", "mimetype": "text/x-python", "codemirror_mode": {"name": "ipython", "version": 3}, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py"}}, "nbformat": 4, "nbformat_minor": 1}