Notebook 1 : Predict future COVID-19 cases for Brussels region with Long Short-Term Memory (LSTM) Model
"}, {"metadata": {"id": "064903c5e517420896382c112885aeef"}, "cell_type": "markdown", "source": "In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases."}, {"metadata": {"id": "b63f82d05b984223974a40be6249f092"}, "cell_type": "markdown", "source": "### Import required libraries"}, {"metadata": {"id": "13ad35bac1f84a0785d69d1a7753b4fb"}, "cell_type": "code", "source": "import boto3\nimport numpy as np \nimport pandas as pd \nfrom keras.layers.core import Dense, Dropout\nfrom keras.layers.recurrent import LSTM\nfrom keras.models import Sequential\nfrom tensorflow.keras.optimizers import Adam\nimport math, time\nfrom sklearn.preprocessing import MinMaxScaler\nimport matplotlib.pyplot as plt\nfrom numpy import newaxis\nfrom keras.callbacks import EarlyStopping\nimport tensorflow\nfrom io import StringIO\nimport datetime\nimport io\nimport itertools\nfrom project_lib import Project\nproject = Project.access()\n%matplotlib inline", "execution_count": 1, "outputs": []}, {"metadata": {"id": "54ef6c9c375945638d46540986ef68a5"}, "cell_type": "markdown", "source": "### Load the dataset from Amazon S3 into pandas dataframe\n\n\n\n- Click on **find and add data (0100)** button on top right. \n- Click on **Connections** tab.\n- You will see your connection variable. Click on **Insert to code** and select **pandas DataFrame**.\n- Select the **ts-brussels-grouped.csv** dataset from the connection variable.\n\n>Note: you can add the comment `# @hidden_cell` in the below code cell. Cloud Pak for Data will automatically hide the cell before sharing it."}, {"metadata": {"id": "10f3b48312fd423b804adb94be85c6f0"}, "cell_type": "code", "source": "", "execution_count": 2, "outputs": [{"data": {"text/plain": " DATE REGION Total_cases\n0 15/03/20 Brussels 119\n1 16/03/20 Brussels 238\n2 17/03/20 Brussels 219\n3 18/03/20 Brussels 346\n4 19/03/20 Brussels 583\n5 20/03/20 Brussels 504\n6 21/03/20 Brussels 683\n7 22/03/20 Brussels 719\n8 23/03/20 Brussels 515\n9 24/03/20 Brussels 1058", "text/html": "
\n\n
\n \n
\n
\n
DATE
\n
REGION
\n
Total_cases
\n
\n \n \n
\n
0
\n
15/03/20
\n
Brussels
\n
119
\n
\n
\n
1
\n
16/03/20
\n
Brussels
\n
238
\n
\n
\n
2
\n
17/03/20
\n
Brussels
\n
219
\n
\n
\n
3
\n
18/03/20
\n
Brussels
\n
346
\n
\n
\n
4
\n
19/03/20
\n
Brussels
\n
583
\n
\n
\n
5
\n
20/03/20
\n
Brussels
\n
504
\n
\n
\n
6
\n
21/03/20
\n
Brussels
\n
683
\n
\n
\n
7
\n
22/03/20
\n
Brussels
\n
719
\n
\n
\n
8
\n
23/03/20
\n
Brussels
\n
515
\n
\n
\n
9
\n
24/03/20
\n
Brussels
\n
1058
\n
\n \n
\n
"}, "metadata": {}, "execution_count": 2, "output_type": "execute_result"}]}, {"metadata": {"id": "d6175c18974a42f58023affb240c02da"}, "cell_type": "code", "source": "regionData = data_df_1", "execution_count": 3, "outputs": []}, {"metadata": {"id": "64796fd153dc41c9899feec32458e8ec"}, "cell_type": "markdown", "source": "#### Drop REGION column"}, {"metadata": {"id": "d20ddda88d6e413b8fa734a7fcd64e5c"}, "cell_type": "code", "source": "data_df_1 = data_df_1.drop('REGION', axis=1)", "execution_count": 4, "outputs": []}, {"metadata": {"id": "8cc3027adbf047858764df215dfc1ca1"}, "cell_type": "markdown", "source": "#### Drop the index column and set DATE column as index column"}, {"metadata": {"id": "882d547aee92491288ce7b0fbe1361c6"}, "cell_type": "code", "source": "data_df_1.set_index('DATE', inplace=True)", "execution_count": 5, "outputs": []}, {"metadata": {"id": "26aa6afc1d044b7c8d9517a591abef3a"}, "cell_type": "markdown", "source": "### Fix random seed for reproducibility"}, {"metadata": {"id": "58584bc38f44413c8759f6adb67b1951"}, "cell_type": "code", "source": "tensorflow.random.set_seed(1309)", "execution_count": 6, "outputs": []}, {"metadata": {"id": "d58d669e3b0e4f9b9799a3754df9937f"}, "cell_type": "markdown", "source": "### Rename the dataframe and convert the datatype"}, {"metadata": {"id": "d823cfe088ed436494add605f7f9ffef"}, "cell_type": "code", "source": "series = data_df_1\nseries = series.astype(float)", "execution_count": 7, "outputs": []}, {"metadata": {"id": "36c46721995546e69cb0097defe800d2"}, "cell_type": "markdown", "source": "### Plot the data to see the current trends in COVID-19 cases in Brussels Region"}, {"metadata": {"id": "986f5539ec4641e0af2abd0b9971c60b"}, "cell_type": "code", "source": "plt.figure(figsize=(20,6))\nplt.plot(series.values)\nplt.show()", "execution_count": 8, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}]}, {"metadata": {"id": "d5e5c9cd3c9b44f09e375748cb9023d7"}, "cell_type": "markdown", "source": "### Normalize the data"}, {"metadata": {"id": "828692bc0a9e436184e07d24f1f898bb"}, "cell_type": "code", "source": "series = series.values\nscaler = MinMaxScaler(feature_range=(0, 1))\nseries = scaler.fit_transform(series)", "execution_count": 9, "outputs": []}, {"metadata": {"id": "19c195736c9a414e87966beba99c0388"}, "cell_type": "markdown", "source": "### Train Test Split 70:30 Ratio"}, {"metadata": {"id": "c971f04cf8d6460c893af685cb13fed5"}, "cell_type": "code", "source": "train_size = int(len(series) * 0.70)\ntest_size = len(series) - train_size\ntrain, test = series[0:train_size,:], series[train_size:len(series),:]\nprint(len(train), len(test))", "execution_count": 10, "outputs": [{"name": "stdout", "text": "516 222\n", "output_type": "stream"}]}, {"metadata": {"id": "e9eb1ee7f638470ba9b0419baab37e0f"}, "cell_type": "markdown", "source": "### Helper function to generate the dataset with input(X) & output(Y) variables"}, {"metadata": {"id": "01f605f6c04644dfa6d75ffe782b4191"}, "cell_type": "code", "source": "def create_dataset(dataset, look_back=1):\n dataX, dataY = [], []\n for i in range(len(dataset)-look_back-1):\n a = dataset[i:(i+look_back), 0]\n dataX.append(a)\n dataY.append(dataset[i + look_back, 0])\n return np.array(dataX), np.array(dataY)", "execution_count": 11, "outputs": []}, {"metadata": {"id": "35016902524f47c4a3901c50d95ee145"}, "cell_type": "markdown", "source": "### Create a dataset with a look back period of 30 observations\nThis is where we convert the time series problem into a regression problem"}, {"metadata": {"id": "61dc59629f84427d8c90af7a08de916f"}, "cell_type": "code", "source": "look_back = 30\ntrainX, trainY = create_dataset(train, look_back)\ntestX, testY = create_dataset(test, look_back)", "execution_count": 12, "outputs": []}, {"metadata": {"id": "0a93a0bd8a0f40d880362267ba44effe"}, "cell_type": "markdown", "source": "### Review the shape of datasets"}, {"metadata": {"id": "45b7737b846c41488d7e4d03598aea94"}, "cell_type": "code", "source": "trainX.shape", "execution_count": 13, "outputs": [{"data": {"text/plain": "(485, 30)"}, "metadata": {}, "execution_count": 13, "output_type": "execute_result"}]}, {"metadata": {"id": "1a8f882a4dc64cac8193c8a4b6a9be94"}, "cell_type": "code", "source": "testX.shape", "execution_count": 14, "outputs": [{"data": {"text/plain": "(191, 30)"}, "metadata": {}, "execution_count": 14, "output_type": "execute_result"}]}, {"metadata": {"id": "8c9eeedcbe79482eb01d2ab17dccd676"}, "cell_type": "markdown", "source": "### Reshape the data to 3D\nThe LSTM model requires the input data to be three dimensional"}, {"metadata": {"id": "d1e126a09526463e868011c9b2547ff6"}, "cell_type": "code", "source": "trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))\ntestX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))", "execution_count": 15, "outputs": []}, {"metadata": {"id": "2413b15abb4941df8d2c1e48529055b1"}, "cell_type": "code", "source": "trainX.shape", "execution_count": 16, "outputs": [{"data": {"text/plain": "(485, 30, 1)"}, "metadata": {}, "execution_count": 16, "output_type": "execute_result"}]}, {"metadata": {"id": "23c134fbb71544579451910acb28af8f"}, "cell_type": "markdown", "source": "### Define the LSTM model\nActivation function will activate the neurons for the learning. Rectified linear unit (ReLu) is one of the most popular activations because the output does not go beyond 0.\n\nUnits will be the number of neurons in the input & hidden layers.\n\nStateful is where we define whether the previous information has to be remembered or not.\n\nDropout is where we omit random neurons for each layer as per the value (0 to 1). In this case we omit 20% of the neurons.\n\nOptimiser is where the weights are back propagated through the network to enhance the learnings closer to the desired outcome. Adam optimiser is an efficient method for enhanced accuracy."}, {"metadata": {"id": "151cff5f77064178b6158f11b724d61d"}, "cell_type": "markdown", "source": "Hyperparameters for the current model:\n- **train_test_split:** 0.70\n- **lookback:** 30\n- **hidden_layers:** 2\n- **units:** 55, 100\n- **dropouts:** 0.15, 0.15\n- **optimizer:** adam\n- **learning_rate:** 0.001 (default)\n- **epochs:** 25\n- **batch_size:** 32"}, {"metadata": {"id": "71067fb37f6641268bf84af22eeac41d"}, "cell_type": "code", "source": "print('LSTM Model Summary')\nmodel = Sequential()\nmodel.add(LSTM(input_shape=(trainX.shape[1], trainX.shape[2]), kernel_initializer=\"uniform\", return_sequences=True, stateful=False, units=55))\nmodel.add(Dropout(0.15))\nmodel.add(LSTM(100, kernel_initializer=\"uniform\", activation='relu',return_sequences=False))\nmodel.add(Dropout(0.15))\nmodel.add(Dense(32,kernel_initializer=\"uniform\",activation='relu'))\nmodel.add(Dense(1, activation='linear'))\n# optimizer = Adam(learning_rate=0.0005)\n# model.compile(loss=\"mean_squared_error\", optimizer=optimizer)\nmodel.compile(loss=\"mean_squared_error\", optimizer='adam')\nmodel.summary()", "execution_count": 17, "outputs": [{"name": "stdout", "text": "LSTM Model Summary\nModel: \"sequential\"\n_________________________________________________________________\n Layer (type) Output Shape Param # \n=================================================================\n lstm (LSTM) (None, 30, 55) 12540 \n \n dropout (Dropout) (None, 30, 55) 0 \n \n lstm_1 (LSTM) (None, 100) 62400 \n \n dropout_1 (Dropout) (None, 100) 0 \n \n dense (Dense) (None, 32) 3232 \n \n dense_1 (Dense) (None, 1) 33 \n \n=================================================================\nTotal params: 78,205\nTrainable params: 78,205\nNon-trainable params: 0\n_________________________________________________________________\n", "output_type": "stream"}]}, {"metadata": {"id": "051715d0c1e84ab2ae582f5905298219"}, "cell_type": "markdown", "source": "### Parameter Calculation\nparams = 4 * (size_of_input + 1 * size_of_output) + 4 * size_of_output^2\n\n"}, {"metadata": {"id": "c815bcf26f494d769654005be9006edc"}, "cell_type": "markdown", "source": "### Optimize computation time using early stopping\n\nWe monitor the accuracy of validation loss ('val_loss') and end the training if there's no improvement in the accuracy after five iterations ('patience=5')."}, {"metadata": {"id": "d90bff143b644a4e93d8e899d1c5988d"}, "cell_type": "code", "source": "early_stopping=EarlyStopping(monitor='val_loss', patience=5, verbose=1, mode='auto')", "execution_count": 18, "outputs": []}, {"metadata": {"id": "a04dd4e4a37140d78e8439207b9ce463"}, "cell_type": "markdown", "source": "### Fitting the model for training data"}, {"metadata": {"id": "47db82e646c54e86ad97475695c55622"}, "cell_type": "code", "source": "start = time.time()\nhistory = model.fit(trainX, trainY, batch_size=32, epochs=25, verbose=1, shuffle=False, validation_split=0.10, callbacks=[early_stopping])\nprint(\"> Compilation Time : \", time.time() - start)", "execution_count": 19, "outputs": [{"name": "stdout", "text": "Epoch 1/25\n14/14 [==============================] - 5s 122ms/step - loss: 0.0142 - val_loss: 0.0019\nEpoch 2/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0108 - val_loss: 0.0033\nEpoch 3/25\n14/14 [==============================] - 1s 94ms/step - loss: 0.0106 - val_loss: 0.0023\nEpoch 4/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0099 - val_loss: 0.0023\nEpoch 5/25\n14/14 [==============================] - 1s 94ms/step - loss: 0.0083 - val_loss: 7.4788e-04\nEpoch 6/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0068 - val_loss: 5.4045e-04\nEpoch 7/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0051 - val_loss: 4.2679e-04\nEpoch 8/25\n14/14 [==============================] - 1s 94ms/step - loss: 0.0048 - val_loss: 2.6871e-04\nEpoch 9/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0043 - val_loss: 3.2288e-04\nEpoch 10/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0058 - val_loss: 0.0016\nEpoch 11/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0034 - val_loss: 1.9711e-04\nEpoch 12/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0034 - val_loss: 2.1064e-04\nEpoch 13/25\n14/14 [==============================] - 1s 92ms/step - loss: 0.0039 - val_loss: 6.1753e-04\nEpoch 14/25\n14/14 [==============================] - 1s 93ms/step - loss: 0.0031 - val_loss: 1.3190e-04\nEpoch 15/25\n14/14 [==============================] - 1s 92ms/step - loss: 0.0026 - val_loss: 1.3799e-04\nEpoch 16/25\n14/14 [==============================] - 1s 92ms/step - loss: 0.0024 - val_loss: 1.3126e-04\nEpoch 17/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0023 - val_loss: 1.2932e-04\nEpoch 18/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0022 - val_loss: 1.2908e-04\nEpoch 19/25\n14/14 [==============================] - 1s 99ms/step - loss: 0.0021 - val_loss: 1.5420e-04\nEpoch 20/25\n14/14 [==============================] - 1s 95ms/step - loss: 0.0020 - val_loss: 1.2738e-04\nEpoch 21/25\n14/14 [==============================] - 1s 100ms/step - loss: 0.0020 - val_loss: 1.5354e-04\nEpoch 22/25\n14/14 [==============================] - 1s 99ms/step - loss: 0.0020 - val_loss: 1.2904e-04\nEpoch 23/25\n14/14 [==============================] - 1s 98ms/step - loss: 0.0021 - val_loss: 1.4152e-04\nEpoch 24/25\n14/14 [==============================] - 1s 94ms/step - loss: 0.0020 - val_loss: 1.5942e-04\nEpoch 25/25\n14/14 [==============================] - 1s 94ms/step - loss: 0.0021 - val_loss: 1.3354e-04\nEpoch 00025: early stopping\n> Compilation Time : 37.11146950721741\n", "output_type": "stream"}]}, {"metadata": {"id": "cf16354ada204b128efabce30fab4f68"}, "cell_type": "markdown", "source": "Model run time is ~ 300 seconds and has produced almost similar values for training & validation loss which is great."}, {"metadata": {"id": "6850758269a34f74a5e5439d7dd72764"}, "cell_type": "markdown", "source": "### Create a function to calculate accuracy\nWe will be using 'Mean Squared Error' & 'Root Mean Squared Error' functions to calculate accuracy"}, {"metadata": {"id": "3c45d320fdee4f4aafe109028749310c"}, "cell_type": "code", "source": "def model_score(model, trainX, trainY, testX, testY):\n trainScore = model.evaluate(trainX, trainY, batch_size=32, verbose=0)\n print('Train Score: %.5f MSE (%.2f RMSE)' % (trainScore, math.sqrt(trainScore)))\n print('Train Accuracy: %.2f %%' % (100 - math.sqrt(trainScore)*100))\n\n testScore = model.evaluate(testX, testY, batch_size=32, verbose=0)\n print('Test Score: %.5f MSE (%.2f RMSE)' % (testScore, math.sqrt(testScore)))\n print('Test Accuracy: %.2f %%' % (100 - math.sqrt(testScore)*100))\n return trainScore, testScore", "execution_count": 20, "outputs": []}, {"metadata": {"id": "e3213a83f67b4377976a9af76e99cba9"}, "cell_type": "markdown", "source": "### Check the Accuracy of the model"}, {"metadata": {"id": "9253ca4e259542f98cd9fd0c0b581589"}, "cell_type": "code", "source": "model_score(model, trainX, trainY, testX, testY)", "execution_count": 21, "outputs": [{"name": "stdout", "text": "Train Score: 0.00165 MSE (0.04 RMSE)\nTrain Accuracy: 95.93 %\nTest Score: 0.00987 MSE (0.10 RMSE)\nTest Accuracy: 90.07 %\n", "output_type": "stream"}, {"data": {"text/plain": "(0.0016540562501177192, 0.009867317043244839)"}, "metadata": {}, "execution_count": 21, "output_type": "execute_result"}]}, {"metadata": {"id": "1c629badaaee4d6c94ea839691ed90aa"}, "cell_type": "markdown", "source": "We can observe that the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.\n\nThe model accuracy is > 94% as per the values of Mean Squared Error (MSE)"}, {"metadata": {"id": "eccf42416a8a45968fc738bfbd08932e"}, "cell_type": "markdown", "source": "### Review the learning of training & validation loss (error evaluation)"}, {"metadata": {"id": "c1a7b07f166d432f897df3f032868a03"}, "cell_type": "code", "source": "'''Review the learning'''\n\nplt.plot(history.history['loss']) # Train\nplt.plot(history.history['val_loss']) # Test\nplt.show()", "execution_count": 22, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}]}, {"metadata": {"id": "d3efe96f75d14d4c8957fbe61d5dbeaf"}, "cell_type": "markdown", "source": "There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem."}, {"metadata": {"id": "e6f251c98b594f90b233091181d38be9"}, "cell_type": "markdown", "source": "### Get the configuration of the model\nThis will give us an idea about all the parameters available and which ones have been choosen."}, {"metadata": {"id": "7ce92d3acb1c4a018fd9a7fbd3eee2f2"}, "cell_type": "code", "source": "model.get_config()", "execution_count": 23, "outputs": [{"data": {"text/plain": "{'name': 'sequential',\n 'layers': [{'class_name': 'InputLayer',\n 'config': {'batch_input_shape': (None, 30, 1),\n 'dtype': 'float32',\n 'sparse': False,\n 'ragged': False,\n 'name': 'lstm_input'}},\n {'class_name': 'LSTM',\n 'config': {'name': 'lstm',\n 'trainable': True,\n 'batch_input_shape': (None, 30, 1),\n 'dtype': 'float32',\n 'return_sequences': True,\n 'return_state': False,\n 'go_backwards': False,\n 'stateful': False,\n 'unroll': False,\n 'time_major': False,\n 'units': 55,\n 'activation': 'tanh',\n 'recurrent_activation': 'hard_sigmoid',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'recurrent_initializer': {'class_name': 'Orthogonal',\n 'config': {'gain': 1.0, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'unit_forget_bias': True,\n 'kernel_regularizer': None,\n 'recurrent_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'recurrent_constraint': None,\n 'bias_constraint': None,\n 'dropout': 0.0,\n 'recurrent_dropout': 0.0,\n 'implementation': 1}},\n {'class_name': 'Dropout',\n 'config': {'name': 'dropout',\n 'trainable': True,\n 'dtype': 'float32',\n 'rate': 0.15,\n 'noise_shape': None,\n 'seed': None}},\n {'class_name': 'LSTM',\n 'config': {'name': 'lstm_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'return_sequences': False,\n 'return_state': False,\n 'go_backwards': False,\n 'stateful': False,\n 'unroll': False,\n 'time_major': False,\n 'units': 100,\n 'activation': 'relu',\n 'recurrent_activation': 'hard_sigmoid',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'recurrent_initializer': {'class_name': 'Orthogonal',\n 'config': {'gain': 1.0, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'unit_forget_bias': True,\n 'kernel_regularizer': None,\n 'recurrent_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'recurrent_constraint': None,\n 'bias_constraint': None,\n 'dropout': 0.0,\n 'recurrent_dropout': 0.0,\n 'implementation': 1}},\n {'class_name': 'Dropout',\n 'config': {'name': 'dropout_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'rate': 0.15,\n 'noise_shape': None,\n 'seed': None}},\n {'class_name': 'Dense',\n 'config': {'name': 'dense',\n 'trainable': True,\n 'dtype': 'float32',\n 'units': 32,\n 'activation': 'relu',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'RandomUniform',\n 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'kernel_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'bias_constraint': None}},\n {'class_name': 'Dense',\n 'config': {'name': 'dense_1',\n 'trainable': True,\n 'dtype': 'float32',\n 'units': 1,\n 'activation': 'linear',\n 'use_bias': True,\n 'kernel_initializer': {'class_name': 'GlorotUniform',\n 'config': {'seed': None}},\n 'bias_initializer': {'class_name': 'Zeros', 'config': {}},\n 'kernel_regularizer': None,\n 'bias_regularizer': None,\n 'activity_regularizer': None,\n 'kernel_constraint': None,\n 'bias_constraint': None}}]}"}, "metadata": {}, "execution_count": 23, "output_type": "execute_result"}]}, {"metadata": {"id": "90c16b9e423c47ce964c97671ad947c6"}, "cell_type": "markdown", "source": "### Create a function to plot predicted vs actual values "}, {"metadata": {"id": "23c7df9f-6951-46c1-b5fc-1ce017d616a0"}, "cell_type": "code", "source": "def plot_the_results(predicted_data, true_data, prediction_len):\n fig = plt.figure(facecolor='white', figsize=(16,8))\n ax = fig.add_subplot(111)\n ax.plot(true_data, label='True Data')\n for i, data in enumerate(predicted_data):\n padding = [None for p in range(i * prediction_len)]\n plt.plot(padding + data, label='Prediction')\n plt.plot(padding + data, 'b^')\n plt.show()", "execution_count": 24, "outputs": []}, {"metadata": {"id": "3d9a954976f742d18b5579a009a72d92"}, "cell_type": "markdown", "source": "### Create a function to predict future values"}, {"metadata": {"id": "e69c634bc5764c5d836cb69e84e194d3"}, "cell_type": "code", "source": "def predict_the_sequences(model, data, window_size, prediction_len):\n prediction_seqs = []\n for i in range(int(len(data)/prediction_len)):\n curr_frame = data[i*prediction_len]\n predicted = []\n for j in range(prediction_len):\n predicted.append(model.predict(curr_frame[newaxis,:,:])[0,0])\n curr_frame = curr_frame[1:]\n curr_frame = np.insert(curr_frame, [window_size-1], predicted[-1], axis=0)\n prediction_seqs.append(predicted)\n return prediction_seqs", "execution_count": 25, "outputs": []}, {"metadata": {"id": "3f69fe49e53a4d828ea6a23ed34c417d"}, "cell_type": "markdown", "source": "### Predict future values & plot the results\n In this case, we are predicting the current values.\n If we need to predict t+1 then the prediction_len parameter has to be changed to 2\n and if we need t+2 then prediction_len would be 3"}, {"metadata": {"id": "b94874c3be1241ed8ee0743f30dc53f9"}, "cell_type": "code", "source": "predictions = predict_the_sequences(model, testX, look_back, 1)\n\nplot_the_results(predictions, testY, 1)", "execution_count": 26, "outputs": [{"data": {"text/plain": "
", "image/png": "\n"}, "metadata": {}, "output_type": "display_data"}]}, {"metadata": {"id": "f20574d659d248acb8e22d7c48aa4766"}, "cell_type": "markdown", "source": "We can observe that the model is able to catch the pattern in the data. This can be further improved by changing the hyper parameters however we are demonstrating the methodology."}, {"metadata": {"id": "9b33aead24da44808d524565de55eaee"}, "cell_type": "markdown", "source": "### Predictions for next 7 days"}, {"metadata": {"id": "63c0f4630da94826bd8a12670d7e3462"}, "cell_type": "code", "source": "predictions7 = predict_the_sequences(model, testX, look_back, 7)", "execution_count": 27, "outputs": []}, {"metadata": {"id": "9bec4249-6b91-4c19-a840-f11f80fdf20f"}, "cell_type": "markdown", "source": "### Denormalize the predicted values and save the data\nDenormalize & Convert the predicted output to a dataframe & print the results"}, {"metadata": {"id": "9b5f9ee0e5d74bd09bc47644a247ddfc"}, "cell_type": "code", "source": "predictionValues = scaler.inverse_transform(predictions7)\nresults = pd.DataFrame(np.round(predictionValues[-1:]))", "execution_count": 28, "outputs": []}, {"metadata": {"id": "a759ad93ab5845de928683260dd1152a"}, "cell_type": "code", "source": "results_list = results.values.tolist()", "execution_count": 29, "outputs": []}, {"metadata": {"id": "e87293121b334aa2918637bdb3ca572c"}, "cell_type": "code", "source": "results_list = [int(i) for i in results_list[0]]", "execution_count": 30, "outputs": []}, {"metadata": {"id": "28b2a5d9bef341f6b2a01380e1ac3f7b"}, "cell_type": "code", "source": "last_date = data_df_1.iloc[-1].name\nlastDate = datetime.datetime(int(last_date.split('/')[2]), int(last_date.split('/')[1]), int(last_date.split('/')[0]))\nnext_date = lastDate + datetime.timedelta(days=1)\nnext_date = next_date.strftime('%d/%m/%y')", "execution_count": 31, "outputs": []}, {"metadata": {"id": "3ec0523b96834fdf8a6e7eaef77a1e5f"}, "cell_type": "code", "source": "next_7_days = pd.date_range(start=next_date, periods=7, freq='D')\nnext_7_days_df = pd.DataFrame({'DATE': next_7_days.tolist(), 'Prediction': results_list})", "execution_count": 32, "outputs": []}, {"metadata": {"id": "a2314aa4a9cb47428fd391ecfa6206cc"}, "cell_type": "code", "source": "next_7_days_df['DATE'] = next_7_days_df['DATE'].dt.strftime('%d/%m/%y')", "execution_count": 33, "outputs": []}, {"metadata": {"id": "270406a57eba418d87283de44d9ca160"}, "cell_type": "code", "source": "next_7_days_df.set_index('DATE', inplace=True)", "execution_count": 34, "outputs": []}, {"metadata": {"id": "c4ff1899b636481e9753ae3d7569c58c"}, "cell_type": "code", "source": "finalDf = pd.DataFrame(data_df_1.tail(7))", "execution_count": 35, "outputs": []}, {"metadata": {"id": "144c727f1c8f4d5683f719895cd3e8a4", "scrolled": true}, "cell_type": "code", "source": "next_7_days_df.reset_index()", "execution_count": 36, "outputs": [{"data": {"text/plain": " DATE Prediction\n0 25/03/22 754\n1 26/03/22 773\n2 27/03/22 783\n3 28/03/22 794\n4 29/03/22 805\n5 30/03/22 823\n6 31/03/22 842", "text/html": "
\n\nThe AutoAI graphical tool in Watson Studio analyzes your data and discovers data transformations, algorithms, and parameter settings that work best for your predictive modeling problem. AutoAI displays the results as model candidate pipelines ranked on a leaderboard for you to choose from.\n\nAlso, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at Watson Studio\n\n