{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning Accelerator - Natural Language Processing - Lecture 3\n",
"\n",
"## Final Project: Neural Networks and Recurrent Neural Networks (RNNs) for the IMDB Movie Review Dataset\n",
"\n",
"__Dataset:__ Sentiment (positive or negative) analysis of movie reviews. The dataset is originally hosted here: http://ai.stanford.edu/~amaas/data/sentiment/\n",
"\n",
"We continue to work on our final project dataset. This time, you will try to see how Neural Networks, Recurrent Neural Networks (RNNs), its variants: GRU and LSTM work in predicting the sentiment of review texts. If you are interested in trying Transformers, here is a good place for that too!\n",
"\n",
"Use the notebooks from the class and implement the model, train and test with the corresponding datasets.\n",
"You can follow these steps:\n",
"1. Read training-test data (Given)\n",
"2. Train a classifier (Implement)\n",
"3. Make predictions on your test dataset (Implement)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q -r ../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Reading the dataset\n",
"\n",
"We will use the __pandas__ library to read our dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### __Training data:__"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" text | \n",
" label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" This movie makes me want to throw up every tim... | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" Listening to the director's commentary confirm... | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" One of the best Tarzan films is also one of it... | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" Valentine is now one of my favorite slasher fi... | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" No mention if Ann Rivers Siddons adapted the m... | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" text label\n",
"0 This movie makes me want to throw up every tim... 0\n",
"1 Listening to the director's commentary confirm... 0\n",
"2 One of the best Tarzan films is also one of it... 1\n",
"3 Valentine is now one of my favorite slasher fi... 1\n",
"4 No mention if Ann Rivers Siddons adapted the m... 0"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"train_df = pd.read_csv('../data/final_project/imdb_train.csv', header=0)\n",
"train_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### __Test data:__"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" text | \n",
" label | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" What I hoped for (or even expected) was the we... | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" Garden State must rate amongst the most contri... | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" There is a lot wrong with this film. I will no... | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" To qualify my use of \"realistic\" in the summar... | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" Dirty War is absolutely one of the best politi... | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" text label\n",
"0 What I hoped for (or even expected) was the we... 0\n",
"1 Garden State must rate amongst the most contri... 0\n",
"2 There is a lot wrong with this film. I will no... 1\n",
"3 To qualify my use of \"realistic\" in the summar... 1\n",
"4 Dirty War is absolutely one of the best politi... 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"test_df = pd.read_csv('../data/final_project/imdb_test.csv', header=0)\n",
"test_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Train a Classifier"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Implement this"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Make predictions on your test dataset"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Implement this"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p39",
"language": "python",
"name": "conda_pytorch_p39"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}