{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine Learning University - Decision Trees and Ensemble Models\n",
"\n",
"\n",
"## Final Project \n",
"\n",
"Here is the breakdown of this notebook:\n",
"\n",
"1. Read the dataset (Given) \n",
"2. Train a model (Implement)\n",
" * Exploratory Data Analysis\n",
" * Select features to build the model\n",
" * Data processing\n",
" * Model training\n",
"3. Make predictions on the test dataset (Implement)\n",
"\n",
"__Austin Animal Center Dataset__:\n",
"\n",
"In this notebook, we are working with pet adoption data from __Austin Animal Center__. We joined two datasets that cover intake and outcome of animals. Intake data is available from [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Intakes/wter-evkm) and outcome is from [here](https://data.austintexas.gov/Health-and-Community-Services/Austin-Animal-Center-Outcomes/9t4d-g238). We want you to __predict whether a pet is adopted within the 30 days stay time in the animal center.__ \n",
"\n",
"__Dataset schema:__ \n",
"- __Pet ID__ - Unique ID of pet\n",
"- __Outcome Type__ - State of pet at the time of recording the outcome\n",
"- __Sex upon Outcome__ - Sex of pet at outcome\n",
"- __Name__ - Name of pet \n",
"- __Found Location__ - Found location of pet before entered the center\n",
"- __Intake Type__ - Circumstances bringing the pet to the center\n",
"- __Intake Condition__ - Health condition of pet when entered the center\n",
"- __Pet Type__ - Type of pet\n",
"- __Sex upon Intake__ - Sex of pet when entered the center\n",
"- __Breed__ - Breed of pet \n",
"- __Color__ - Color of pet \n",
"- __Age upon Intake Days__ - Age of pet when entered the center (days)\n",
"- __Time at Center__ - Time at center (0 = less than 30 days; 1 = more than 30 days). This is the value to predict. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -q -r ../../requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Read the datasets (Given)\n",
"(Go to top)\n",
"\n",
"Let's read the datasets into dataframes, using Pandas."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The shape of the training dataset is: (71538, 13)\n",
"The shape of the test dataset is: (23846, 12)\n"
]
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
" \n",
"training_data = pd.read_csv('../../data/final_project/training.csv')\n",
"test_data = pd.read_csv('../../data/final_project/test_features.csv')\n",
"\n",
"print('The shape of the training dataset is:', training_data.shape)\n",
"print('The shape of the test dataset is:', test_data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Train a model (Implement)\n",
"(Go to top)\n",
"\n",
" * Exploratory Data Analysis\n",
" * Select features to build the model\n",
" * Data processing\n",
" * Model training\n",
"\n",
"### 2.1 Exploratory Data Analysis \n",
"(Go to Train a model)\n",
"\n",
"We look at number of rows, columns and some simple statistics of the dataset."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pet ID | \n",
" Outcome Type | \n",
" Sex upon Outcome | \n",
" Name | \n",
" Found Location | \n",
" Intake Type | \n",
" Intake Condition | \n",
" Pet Type | \n",
" Sex upon Intake | \n",
" Breed | \n",
" Color | \n",
" Age upon Intake Days | \n",
" Time at Center | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A745079 | \n",
" Transfer | \n",
" Unknown | \n",
" NaN | \n",
" 7920 Old Lockhart in Travis (TX) | \n",
" Stray | \n",
" Normal | \n",
" Cat | \n",
" Unknown | \n",
" Domestic Shorthair Mix | \n",
" Blue | \n",
" 3 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" A801765 | \n",
" Transfer | \n",
" Intact Female | \n",
" NaN | \n",
" 5006 Table Top in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Cat | \n",
" Intact Female | \n",
" Domestic Shorthair | \n",
" Brown Tabby/White | \n",
" 28 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" A667965 | \n",
" Transfer | \n",
" Neutered Male | \n",
" NaN | \n",
" 14100 Thermal Dr in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Dog | \n",
" Neutered Male | \n",
" Chihuahua Shorthair Mix | \n",
" Brown/Tan | \n",
" 1825 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" A687551 | \n",
" Transfer | \n",
" Intact Male | \n",
" NaN | \n",
" 5811 Cedardale Dr in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Cat | \n",
" Intact Male | \n",
" Domestic Shorthair Mix | \n",
" Brown Tabby | \n",
" 28 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" A773004 | \n",
" Adoption | \n",
" Neutered Male | \n",
" *Boris | \n",
" Highway 290 And Arterial A in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Dog | \n",
" Intact Male | \n",
" Chihuahua Shorthair Mix | \n",
" Tricolor/Cream | \n",
" 365 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pet ID Outcome Type Sex upon Outcome Name \\\n",
"0 A745079 Transfer Unknown NaN \n",
"1 A801765 Transfer Intact Female NaN \n",
"2 A667965 Transfer Neutered Male NaN \n",
"3 A687551 Transfer Intact Male NaN \n",
"4 A773004 Adoption Neutered Male *Boris \n",
"\n",
" Found Location Intake Type Intake Condition \\\n",
"0 7920 Old Lockhart in Travis (TX) Stray Normal \n",
"1 5006 Table Top in Austin (TX) Stray Normal \n",
"2 14100 Thermal Dr in Austin (TX) Stray Normal \n",
"3 5811 Cedardale Dr in Austin (TX) Stray Normal \n",
"4 Highway 290 And Arterial A in Austin (TX) Stray Normal \n",
"\n",
" Pet Type Sex upon Intake Breed Color \\\n",
"0 Cat Unknown Domestic Shorthair Mix Blue \n",
"1 Cat Intact Female Domestic Shorthair Brown Tabby/White \n",
"2 Dog Neutered Male Chihuahua Shorthair Mix Brown/Tan \n",
"3 Cat Intact Male Domestic Shorthair Mix Brown Tabby \n",
"4 Dog Intact Male Chihuahua Shorthair Mix Tricolor/Cream \n",
"\n",
" Age upon Intake Days Time at Center \n",
"0 3 0 \n",
"1 28 0 \n",
"2 1825 0 \n",
"3 28 0 \n",
"4 365 0 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Implement here\n",
"\n",
"training_data.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pet ID | \n",
" Outcome Type | \n",
" Sex upon Outcome | \n",
" Name | \n",
" Found Location | \n",
" Intake Type | \n",
" Intake Condition | \n",
" Pet Type | \n",
" Sex upon Intake | \n",
" Breed | \n",
" Color | \n",
" Age upon Intake Days | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A782657 | \n",
" Adoption | \n",
" Spayed Female | \n",
" NaN | \n",
" 1911 Dear Run Drive in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Dog | \n",
" Intact Female | \n",
" Labrador Retriever Mix | \n",
" Black | \n",
" 60 | \n",
"
\n",
" \n",
" 1 | \n",
" A804622 | \n",
" Adoption | \n",
" Neutered Male | \n",
" NaN | \n",
" 702 Grand Canyon in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Dog | \n",
" Intact Male | \n",
" Boxer/Anatol Shepherd | \n",
" Brown/Tricolor | \n",
" 60 | \n",
"
\n",
" \n",
" 2 | \n",
" A786693 | \n",
" Return to Owner | \n",
" Neutered Male | \n",
" Zeus | \n",
" Austin (TX) | \n",
" Public Assist | \n",
" Normal | \n",
" Dog | \n",
" Neutered Male | \n",
" Australian Cattle Dog/Pit Bull | \n",
" Black/White | \n",
" 3285 | \n",
"
\n",
" \n",
" 3 | \n",
" A693330 | \n",
" Adoption | \n",
" Spayed Female | \n",
" Hope | \n",
" Levander Loop & Airport Blvd in Austin (TX) | \n",
" Stray | \n",
" Normal | \n",
" Dog | \n",
" Intact Female | \n",
" Miniature Poodle | \n",
" Gray | \n",
" 1825 | \n",
"
\n",
" \n",
" 4 | \n",
" A812431 | \n",
" Adoption | \n",
" Neutered Male | \n",
" NaN | \n",
" Austin (TX) | \n",
" Owner Surrender | \n",
" Injured | \n",
" Cat | \n",
" Intact Male | \n",
" Domestic Shorthair | \n",
" Blue/White | \n",
" 210 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pet ID Outcome Type Sex upon Outcome Name \\\n",
"0 A782657 Adoption Spayed Female NaN \n",
"1 A804622 Adoption Neutered Male NaN \n",
"2 A786693 Return to Owner Neutered Male Zeus \n",
"3 A693330 Adoption Spayed Female Hope \n",
"4 A812431 Adoption Neutered Male NaN \n",
"\n",
" Found Location Intake Type \\\n",
"0 1911 Dear Run Drive in Austin (TX) Stray \n",
"1 702 Grand Canyon in Austin (TX) Stray \n",
"2 Austin (TX) Public Assist \n",
"3 Levander Loop & Airport Blvd in Austin (TX) Stray \n",
"4 Austin (TX) Owner Surrender \n",
"\n",
" Intake Condition Pet Type Sex upon Intake Breed \\\n",
"0 Normal Dog Intact Female Labrador Retriever Mix \n",
"1 Normal Dog Intact Male Boxer/Anatol Shepherd \n",
"2 Normal Dog Neutered Male Australian Cattle Dog/Pit Bull \n",
"3 Normal Dog Intact Female Miniature Poodle \n",
"4 Injured Cat Intact Male Domestic Shorthair \n",
"\n",
" Color Age upon Intake Days \n",
"0 Black 60 \n",
"1 Brown/Tricolor 60 \n",
"2 Black/White 3285 \n",
"3 Gray 1825 \n",
"4 Blue/White 210 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Implement here\n",
"\n",
"test_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Select features to build the model \n",
"(Go to Train a model)\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Implement here\n",
"\n",
"# numerical_features = ...\n",
"# categorical_features = ...\n",
"# text_features = ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Data Processing \n",
"(Go to Train a model)\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Implement here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 Model training \n",
"(Go to Train a model)\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Implement here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Make predictions on the test dataset (Implement)\n",
"(Go to top)\n",
"\n",
"Use the test set to make predictions with the trained model."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Implement here\n",
"\n",
"# test_predictions = ..."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_mxnet_p36",
"language": "python",
"name": "conda_mxnet_p36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}