{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# Demo Notebook for Online Retail Analysis\n",
"\n",
"#### [download notebook](https://github.com/opensearch-project/opensearch-py-ml/blob/main/docs/source/examples/online_retail_analysis.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 0: Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# import this to stop opensearch-py-ml from yelling every time a DataFrame connection made\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:06.764412Z",
"iopub.status.busy": "2021-12-15T20:25:06.755567Z",
"iopub.status.idle": "2021-12-15T20:25:07.316950Z",
"shell.execute_reply": "2021-12-15T20:25:07.316561Z"
}
},
"outputs": [],
"source": [
"# imports to demonstrate DataFrame support\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import opensearch_py_ml as oml\n",
"from opensearchpy import OpenSearch\n",
"\n",
"# Import standard test settings for consistent results\n",
"from opensearch_py_ml.conftest import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Setup clients"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"CLUSTER_URL = 'https://localhost:9200'\n",
"\n",
"def get_os_client(cluster_url = CLUSTER_URL,\n",
" username='admin',\n",
" password='admin'):\n",
" '''\n",
" Get OpenSearch client\n",
" :param cluster_url: cluster URL like https://ml-te-netwo-1s12ba42br23v-ff1736fa7db98ff2.elb.us-west-2.amazonaws.com:443\n",
" :return: OpenSearch client\n",
" '''\n",
" client = OpenSearch(\n",
" hosts=[cluster_url],\n",
" http_auth=(username, password),\n",
" verify_certs=False\n",
" )\n",
" return client"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"client = get_os_client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting Started\n",
"\n",
"To get started, let's create an `opensearch_py_ml.DataFrame` by reading a csv file. This creates and populates the \n",
"`online-retail` index in the local Opensearch cluster."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:07.324283Z",
"iopub.status.busy": "2021-12-15T20:25:07.323764Z",
"iopub.status.idle": "2021-12-15T20:25:16.241379Z",
"shell.execute_reply": "2021-12-15T20:25:16.241877Z"
}
},
"outputs": [],
"source": [
"df = oml.csv_to_opensearch(\"data/online-retail.csv.gz\",\n",
" os_client=client, \n",
" os_dest_index='online-retail', \n",
" es_if_exists='replace', \n",
" os_dropna=True,\n",
" es_refresh=True,\n",
" compression='gzip',\n",
" index_col=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that the `\"_id\"` field was used to index our data frame. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:16.246737Z",
"iopub.status.busy": "2021-12-15T20:25:16.244084Z",
"iopub.status.idle": "2021-12-15T20:25:16.250080Z",
"shell.execute_reply": "2021-12-15T20:25:16.250410Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'_id'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index.os_index_field"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we can check which field from opensearch are available to our opensearch_py_ml data frame. `columns` is available as a parameter when instantiating the data frame which allows one to choose only a subset of fields from your index to be included in the data frame. Since we didn't set this parameter, we have access to all fields."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:16.254703Z",
"iopub.status.busy": "2021-12-15T20:25:16.254060Z",
"iopub.status.idle": "2021-12-15T20:25:16.256567Z",
"shell.execute_reply": "2021-12-15T20:25:16.256138Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Country', 'CustomerID', 'Description', 'InvoiceDate', 'InvoiceNo', 'Quantity', 'StockCode',\n",
" 'UnitPrice'],\n",
" dtype='object')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's see the data types of our fields. Running `df.dtypes`, we can see that opensearch field types are mapped to pandas field types."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:16.261335Z",
"iopub.status.busy": "2021-12-15T20:25:16.260762Z",
"iopub.status.idle": "2021-12-15T20:25:16.263024Z",
"shell.execute_reply": "2021-12-15T20:25:16.263323Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Country object\n",
"CustomerID float64\n",
"Description object\n",
"InvoiceDate object\n",
"InvoiceNo object\n",
"Quantity int64\n",
"StockCode object\n",
"UnitPrice float64\n",
"dtype: object"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also offer a `.os_info()` data frame method that shows all info about the underlying index. It also contains information about operations being passed from data frame methods to opensearch. More on this later."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:16.266245Z",
"iopub.status.busy": "2021-12-15T20:25:16.265860Z",
"iopub.status.idle": "2021-12-15T20:25:16.271135Z",
"shell.execute_reply": "2021-12-15T20:25:16.270816Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"os_index_pattern: online-retail\n",
"Index:\n",
" os_index_field: _id\n",
" is_source_field: False\n",
"Mappings:\n",
" capabilities:\n",
" os_field_name is_source os_dtype os_date_format pd_dtype is_searchable is_aggregatable is_scripted aggregatable_os_field_name\n",
"Country Country True keyword None object True True False Country\n",
"CustomerID CustomerID True double None float64 True True False CustomerID\n",
"Description Description True keyword None object True True False Description\n",
"InvoiceDate InvoiceDate True keyword None object True True False InvoiceDate\n",
"InvoiceNo InvoiceNo True keyword None object True True False InvoiceNo\n",
"Quantity Quantity True long None int64 True True False Quantity\n",
"StockCode StockCode True keyword None object True True False StockCode\n",
"UnitPrice UnitPrice True double None float64 True True False UnitPrice\n",
"Operations:\n",
" tasks: []\n",
" size: None\n",
" sort_params: None\n",
" _source: ['Country', 'CustomerID', 'Description', 'InvoiceDate', 'InvoiceNo', 'Quantity', 'StockCode', 'UnitPrice']\n",
" body: {}\n",
" post_processing: []\n",
"\n"
]
}
],
"source": [
"print(df.os_info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting and Indexing Data\n",
"\n",
"Now that we understand how to create a data frame and get access to it's underlying attributes, let's see how we can select subsets of our data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### head and tail\n",
"\n",
"much like pandas, opensearch_py_ml data frames offer `.head(n)` and `.tail(n)` methods that return the first and last n rows, respectively."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:16.274779Z",
"iopub.status.busy": "2021-12-15T20:25:16.274393Z",
"iopub.status.idle": "2021-12-15T20:25:17.555325Z",
"shell.execute_reply": "2021-12-15T20:25:17.555642Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 85123A | \n",
" 2.55 | \n",
"
\n",
" \n",
" 1 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 71053 | \n",
" 3.39 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"2 rows × 8 columns
"
],
"text/plain": [
" Country CustomerID ... StockCode UnitPrice\n",
"0 United Kingdom 17850.0 ... 85123A 2.55\n",
"1 United Kingdom 17850.0 ... 71053 3.39\n",
"\n",
"[2 rows x 8 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:17.559534Z",
"iopub.status.busy": "2021-12-15T20:25:17.559123Z",
"iopub.status.idle": "2021-12-15T20:25:17.637500Z",
"shell.execute_reply": "2021-12-15T20:25:17.637125Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"os_index_pattern: online-retail\n",
"Index:\n",
" os_index_field: _id\n",
" is_source_field: False\n",
"Mappings:\n",
" capabilities:\n",
" os_field_name is_source os_dtype os_date_format pd_dtype is_searchable is_aggregatable is_scripted aggregatable_os_field_name\n",
"Country Country True keyword None object True True False Country\n",
"CustomerID CustomerID True double None float64 True True False CustomerID\n",
"Description Description True keyword None object True True False Description\n",
"InvoiceDate InvoiceDate True keyword None object True True False InvoiceDate\n",
"InvoiceNo InvoiceNo True keyword None object True True False InvoiceNo\n",
"Quantity Quantity True long None int64 True True False Quantity\n",
"StockCode StockCode True keyword None object True True False StockCode\n",
"UnitPrice UnitPrice True double None float64 True True False UnitPrice\n",
"Operations:\n",
" tasks: [('tail': ('sort_field': '_doc', 'count': 2)), ('head': ('sort_field': '_doc', 'count': 2)), ('tail': ('sort_field': '_doc', 'count': 2))]\n",
" size: 2\n",
" sort_params: {'_doc': 'desc'}\n",
" _source: ['Country', 'CustomerID', 'Description', 'InvoiceDate', 'InvoiceNo', 'Quantity', 'StockCode', 'UnitPrice']\n",
" body: {}\n",
" post_processing: [('sort_index'), ('head': ('count': 2)), ('tail': ('count': 2))]\n",
"\n"
]
}
],
"source": [
"print(df.tail(2).head(2).tail(2).os_info())"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:17.640519Z",
"iopub.status.busy": "2021-12-15T20:25:17.640139Z",
"iopub.status.idle": "2021-12-15T20:25:18.647340Z",
"shell.execute_reply": "2021-12-15T20:25:18.646548Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" 14998 | \n",
" United Kingdom | \n",
" 17419.0 | \n",
" ... | \n",
" 21773 | \n",
" 1.25 | \n",
"
\n",
" \n",
" 14999 | \n",
" United Kingdom | \n",
" 17419.0 | \n",
" ... | \n",
" 22149 | \n",
" 2.10 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"2 rows × 8 columns
"
],
"text/plain": [
" Country CustomerID ... StockCode UnitPrice\n",
"14998 United Kingdom 17419.0 ... 21773 1.25\n",
"14999 United Kingdom 17419.0 ... 22149 2.10\n",
"\n",
"[2 rows x 8 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting columns\n",
"\n",
"you can also pass a list of columns to select columns from the data frame in a specified order."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:18.654238Z",
"iopub.status.busy": "2021-12-15T20:25:18.653517Z",
"iopub.status.idle": "2021-12-15T20:25:19.431749Z",
"shell.execute_reply": "2021-12-15T20:25:19.431127Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" InvoiceDate | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United Kingdom | \n",
" 2010-12-01 08:26:00 | \n",
"
\n",
" \n",
" 1 | \n",
" United Kingdom | \n",
" 2010-12-01 08:26:00 | \n",
"
\n",
" \n",
" 2 | \n",
" United Kingdom | \n",
" 2010-12-01 08:26:00 | \n",
"
\n",
" \n",
" 3 | \n",
" United Kingdom | \n",
" 2010-12-01 08:26:00 | \n",
"
\n",
" \n",
" 4 | \n",
" United Kingdom | \n",
" 2010-12-01 08:26:00 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"5 rows × 2 columns
"
],
"text/plain": [
" Country InvoiceDate\n",
"0 United Kingdom 2010-12-01 08:26:00\n",
"1 United Kingdom 2010-12-01 08:26:00\n",
"2 United Kingdom 2010-12-01 08:26:00\n",
"3 United Kingdom 2010-12-01 08:26:00\n",
"4 United Kingdom 2010-12-01 08:26:00\n",
"\n",
"[5 rows x 2 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Country', 'InvoiceDate']].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Boolean Indexing\n",
"\n",
"we also allow you to filter the data frame using boolean indexing. Under the hood, a boolean index maps to a `terms` query that is then passed to opensearch to filter the index."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:19.440640Z",
"iopub.status.busy": "2021-12-15T20:25:19.439831Z",
"iopub.status.idle": "2021-12-15T20:25:20.066747Z",
"shell.execute_reply": "2021-12-15T20:25:20.067477Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'term': {'Country': 'Germany'}}\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" 1109 | \n",
" Germany | \n",
" 12662.0 | \n",
" ... | \n",
" 22809 | \n",
" 2.95 | \n",
"
\n",
" \n",
" 1110 | \n",
" Germany | \n",
" 12662.0 | \n",
" ... | \n",
" 84347 | \n",
" 2.55 | \n",
"
\n",
" \n",
" 1111 | \n",
" Germany | \n",
" 12662.0 | \n",
" ... | \n",
" 84945 | \n",
" 0.85 | \n",
"
\n",
" \n",
" 1112 | \n",
" Germany | \n",
" 12662.0 | \n",
" ... | \n",
" 22242 | \n",
" 1.65 | \n",
"
\n",
" \n",
" 1113 | \n",
" Germany | \n",
" 12662.0 | \n",
" ... | \n",
" 22244 | \n",
" 1.95 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"5 rows × 8 columns
"
],
"text/plain": [
" Country CustomerID ... StockCode UnitPrice\n",
"1109 Germany 12662.0 ... 22809 2.95\n",
"1110 Germany 12662.0 ... 84347 2.55\n",
"1111 Germany 12662.0 ... 84945 0.85\n",
"1112 Germany 12662.0 ... 22242 1.65\n",
"1113 Germany 12662.0 ... 22244 1.95\n",
"\n",
"[5 rows x 8 columns]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the construction of a boolean vector maps directly to an opensearch query\n",
"print(df['Country']=='Germany')\n",
"df[(df['Country']=='Germany')].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"we can also filter the data frame using a list of values."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:20.077022Z",
"iopub.status.busy": "2021-12-15T20:25:20.076412Z",
"iopub.status.idle": "2021-12-15T20:25:21.233013Z",
"shell.execute_reply": "2021-12-15T20:25:21.234073Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'terms': {'Country': ['Germany', 'United States']}}\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 85123A | \n",
" 2.55 | \n",
"
\n",
" \n",
" 1 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 71053 | \n",
" 3.39 | \n",
"
\n",
" \n",
" 2 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 84406B | \n",
" 2.75 | \n",
"
\n",
" \n",
" 3 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 84029G | \n",
" 3.39 | \n",
"
\n",
" \n",
" 4 | \n",
" United Kingdom | \n",
" 17850.0 | \n",
" ... | \n",
" 84029E | \n",
" 3.39 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"5 rows × 8 columns
"
],
"text/plain": [
" Country CustomerID ... StockCode UnitPrice\n",
"0 United Kingdom 17850.0 ... 85123A 2.55\n",
"1 United Kingdom 17850.0 ... 71053 3.39\n",
"2 United Kingdom 17850.0 ... 84406B 2.75\n",
"3 United Kingdom 17850.0 ... 84029G 3.39\n",
"4 United Kingdom 17850.0 ... 84029E 3.39\n",
"\n",
"[5 rows x 8 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(df['Country'].isin(['Germany', 'United States']))\n",
"df[df['Country'].isin(['Germany', 'United Kingdom'])].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also combine boolean vectors to further filter the data frame."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:21.245390Z",
"iopub.status.busy": "2021-12-15T20:25:21.244737Z",
"iopub.status.idle": "2021-12-15T20:25:22.358701Z",
"shell.execute_reply": "2021-12-15T20:25:22.355150Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"0 rows × 8 columns
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Country, CustomerID, Description, InvoiceDate, InvoiceNo, Quantity, StockCode, UnitPrice]\n",
"Index: []\n",
"\n",
"[0 rows x 8 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['Country']=='Germany') & (df['Quantity']>90)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using this example, let see how opensearch_py_ml translates this boolean filter to an opensearch `bool` query."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:22.383610Z",
"iopub.status.busy": "2021-12-15T20:25:22.370577Z",
"iopub.status.idle": "2021-12-15T20:25:22.390275Z",
"shell.execute_reply": "2021-12-15T20:25:22.388963Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"os_index_pattern: online-retail\n",
"Index:\n",
" os_index_field: _id\n",
" is_source_field: False\n",
"Mappings:\n",
" capabilities:\n",
" os_field_name is_source os_dtype os_date_format pd_dtype is_searchable is_aggregatable is_scripted aggregatable_os_field_name\n",
"Country Country True keyword None object True True False Country\n",
"CustomerID CustomerID True double None float64 True True False CustomerID\n",
"Description Description True keyword None object True True False Description\n",
"InvoiceDate InvoiceDate True keyword None object True True False InvoiceDate\n",
"InvoiceNo InvoiceNo True keyword None object True True False InvoiceNo\n",
"Quantity Quantity True long None int64 True True False Quantity\n",
"StockCode StockCode True keyword None object True True False StockCode\n",
"UnitPrice UnitPrice True double None float64 True True False UnitPrice\n",
"Operations:\n",
" tasks: [('boolean_filter': ('boolean_filter': {'bool': {'must': [{'term': {'Country': 'Germany'}}, {'range': {'Quantity': {'gt': 90}}}]}}))]\n",
" size: None\n",
" sort_params: None\n",
" _source: ['Country', 'CustomerID', 'Description', 'InvoiceDate', 'InvoiceNo', 'Quantity', 'StockCode', 'UnitPrice']\n",
" body: {'query': {'bool': {'must': [{'term': {'Country': 'Germany'}}, {'range': {'Quantity': {'gt': 90}}}]}}}\n",
" post_processing: []\n",
"\n"
]
}
],
"source": [
"print(df[(df['Country']=='Germany') & (df['Quantity']>90)].os_info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Aggregation and Descriptive Statistics\n",
"\n",
"Let's begin to ask some questions of our data and use opensearch_py_ml to get the answers."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**How many different countries are there?**"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:22.398231Z",
"iopub.status.busy": "2021-12-15T20:25:22.397459Z",
"iopub.status.idle": "2021-12-15T20:25:22.482238Z",
"shell.execute_reply": "2021-12-15T20:25:22.481338Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Country'].nunique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**What is the total sum of products ordered?**"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:22.492668Z",
"iopub.status.busy": "2021-12-15T20:25:22.491590Z",
"iopub.status.idle": "2021-12-15T20:25:22.580015Z",
"shell.execute_reply": "2021-12-15T20:25:22.578300Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"111960"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Quantity'].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Show me the sum, mean, min, and max of the qunatity and unit_price fields**"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:22.601432Z",
"iopub.status.busy": "2021-12-15T20:25:22.600117Z",
"iopub.status.idle": "2021-12-15T20:25:22.702450Z",
"shell.execute_reply": "2021-12-15T20:25:22.701499Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Quantity | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" sum | \n",
" 111960.000 | \n",
" 61548.490000 | \n",
"
\n",
" \n",
" mean | \n",
" 7.464 | \n",
" 4.103233 | \n",
"
\n",
" \n",
" max | \n",
" 2880.000 | \n",
" 950.990000 | \n",
"
\n",
" \n",
" min | \n",
" -9360.000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Quantity UnitPrice\n",
"sum 111960.000 61548.490000\n",
"mean 7.464 4.103233\n",
"max 2880.000 950.990000\n",
"min -9360.000 0.000000"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Quantity','UnitPrice']].agg(['sum', 'mean', 'max', 'min'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Give me descriptive statistics for the entire data frame**"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:22.712002Z",
"iopub.status.busy": "2021-12-15T20:25:22.711114Z",
"iopub.status.idle": "2021-12-15T20:25:22.982698Z",
"shell.execute_reply": "2021-12-15T20:25:22.981770Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" CustomerID | \n",
" Quantity | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 10729.000000 | \n",
" 15000.000000 | \n",
" 15000.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 15590.776680 | \n",
" 7.464000 | \n",
" 4.103233 | \n",
"
\n",
" \n",
" std | \n",
" 1764.189592 | \n",
" 85.930116 | \n",
" 20.106214 | \n",
"
\n",
" \n",
" min | \n",
" 12347.000000 | \n",
" -9360.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 14222.689466 | \n",
" 1.000000 | \n",
" 1.250000 | \n",
"
\n",
" \n",
" 50% | \n",
" 15668.019608 | \n",
" 2.000000 | \n",
" 2.510000 | \n",
"
\n",
" \n",
" 75% | \n",
" 17218.806604 | \n",
" 6.472000 | \n",
" 4.212788 | \n",
"
\n",
" \n",
" max | \n",
" 18239.000000 | \n",
" 2880.000000 | \n",
" 950.990000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" CustomerID Quantity UnitPrice\n",
"count 10729.000000 15000.000000 15000.000000\n",
"mean 15590.776680 7.464000 4.103233\n",
"std 1764.189592 85.930116 20.106214\n",
"min 12347.000000 -9360.000000 0.000000\n",
"25% 14222.689466 1.000000 1.250000\n",
"50% 15668.019608 2.000000 2.510000\n",
"75% 17218.806604 6.472000 4.212788\n",
"max 18239.000000 2880.000000 950.990000"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NBVAL_IGNORE_OUTPUT\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Show me a histogram of numeric columns**"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:23.000466Z",
"iopub.status.busy": "2021-12-15T20:25:22.999571Z",
"iopub.status.idle": "2021-12-15T20:25:23.576387Z",
"shell.execute_reply": "2021-12-15T20:25:23.576703Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAtUAAAEICAYAAACQ+wgHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAAsTAAALEwEAmpwYAAAlTUlEQVR4nO3df7RfdX3n++dLEH+gEn50jhiYJh0yutCMymQBjnN7z4iFgE7D3KUMHToELl25cxdabTNtoZ17uVVZCzsyFGxlJlOowTICpVq4laoZ9NwOMxdE1IqAXlIIkjSAmoANjNQw7/vH93Pka8zhnJN9cs757jwfa5119v7sz97fzzv77O/3nf397M8nVYUkSZKkffeihW6AJEmSNOpMqiVJkqSOTKolSZKkjkyqJUmSpI5MqiVJkqSOTKolSZKkjkyqpSFJdiX5mYVuhyRpIMm/T/J/LNbjSZNMqrXgkpyX5N4kzyR5LMnHkhw2D687keSXhsuq6hVV9VDb/vEkH9rf7ZCkPktSSY7bo+z/SvJHM9m/qv5VVX2w7TeeZOtejvXDdlPkyST/LclbZnI8aS6ZVGtBJVkPfBj4NeAw4GRgGfD5JC9ewKZJkkbHjVX1CuCngDuATyXJnpWSHDTvLdMBw6RaCybJq4DfBt5bVZ+tqh9W1RbgLOBngH+x593iPe9SJLkoyV8l+Zsk9yf5Z0PbzktyR5KPJNmZ5OEkp7dtlwL/E/B77e7G77XySnJcknXAOcCvt+3/d5JfS/Ine8RwVZIr99e/kST13eT7epL1SZ5Isj3J+UPbP57kQ0kOBf4ceE17X96V5DXDx6qqHwIbgVcDR7Z9r05yW5KngX+yl8+VNUm+luT77fNkdSs/LMk1rT3bWhtMyjUlk2otpH8EvBT41HBhVe0CbgNOncEx/opBcnwYgwT9j5IcPbT9JOBbwFHA7wDXJElV/RbwX4D3tC4f79mjDRuA64Hfadv/KfBHwOokSwCSHAycDVw3q6glSXt6NYP38aXABcDvJzl8uEJVPQ2cDvx1e19+RVX99XCdJC8BzgMerarvtuJ/AVwKvJLBXezh+icyeA//NWAJ8LPAlrb548Bu4DjgzQw+k36sy6A0zKRaC+ko4LtVtXsv27Yz+BrvBVXVH1fVX1fV/6iqG4EHgROHqjxSVf+xqp5jcPfiaGBsXxpbVduBvwDe3YpWt/bfsy/HkyT9yA+BD7RvLG8DdgGvncX+ZyV5EngU+IfAPxvadktV/df2OfGDPfa7ALi2qja17duq6ptJxoAzgPdX1dNV9QRwBYMbKdJeHbzQDdAB7bvAUUkO3ktifXTb/oKSnAv8KoN+2ACvYJCsT3pscqGqnmld7F7Roc0bgf8d+I/ALwKf6HAsSToQPAfs+YzMixkk0pO+t8fnwDPM7r36pqr6xSm2PfoC+x3L4JvRPf10a+P2oa7ZL5rmWDrAeadaC+n/BZ4F/pfhwiSvYPAV3wTwNPDyoc2vHqr30wyS2/cAR1bVEuAbwE88nDKF2oftfwr8gyRvAN7JoIuIJGlq3+b5Gx+TlgOP7MOxpnvfnu0+jwJ/b4ryZ4GjqmpJ+3lVVb1+H15fBwiTai2YqnqKQT/ojyZZneTFSZYBNzG4S3098DXgjCRHJHk18P6hQxzK4M3yOwDtwZY3zKIJjzN4IHLG29tXhzcD/wn4UlV9exavJ0kHohuBf5PkmCQvSvJ24J8yeC+drccZPIA4V8OuXgOcn+SU1ralSV7Xuvt9Hrg8yavatr+X5H+eo9dVD5lUa0FV1e8Avwl8BPgb4GEGd6bf3h5K+QTwlwweHPk8gzfnyX3vBy5ncMf7cWAl8F9n8fJXAu9qI4NctZft1wDHt3FP/3SofGN7Lbt+SNL0PgD8NwYPCe5k8ND4OVX1jdkeqKq+CXwSeKi9N79mun2mOd6XgPMZ9Jd+Cvh/GHT9ADgXOAS4v7X7ZgZdE6W9StW+fJMi7R/tbvMHgLcu1rvASf4u8E3g1VX1/YVujyRJWng+qKhFpar+MMluBsPtLbqkOsmLGDwYeYMJtSRJmuSdammG2sQDjzN4uGZ1VfkUuCRJAkyqJUmSpM58UFGSJEnqaFH3qT7qqKNq2bJlC90MAJ5++mkOPfTQhW7GfmFso6evccHijO2ee+75blVNO8On9s2+vNcvxr+TudLn2KDf8fU5Nuh/fF3f6xd1Ur1s2TK+/OUvL3QzAJiYmGB8fHyhm7FfGNvo6WtcsDhjS7Ivk1RohvblvX4x/p3MlT7HBv2Or8+xQf/j6/peb/cPSZIkqSOTakmSJKkjk2pJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpo0U9o6I0F5Zd9Jlp62y57B3z0BJJc+XebU9x3jTXtte1pPnknWpJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpI5NqSZIkqSOTakmSJKmjGSXVSX4lyX1JvpHkk0lemmR5kruSbE5yY5JDWt2XtPXNbfuyoeNc3Mq/leS0/RSTJEmSNK+mTaqTLAV+GVhVVW8ADgLOBj4MXFFVxwE7gQvaLhcAO1v5Fa0eSY5v+70eWA18LMlBcxuOJEmSNP9m2v3jYOBlSQ4GXg5sB94G3Ny2bwTObMtr2jpt+ylJ0spvqKpnq+phYDNwYucIJEmSpAU27YyKVbUtyUeAbwP/Hfg8cA/wZFXtbtW2Akvb8lLg0bbv7iRPAUe28juHDj28z48kWQesAxgbG2NiYmL2Ue0Hu3btWjRtmWt9j239yuemrTdq8ff9nPU1NklSf02bVCc5nMFd5uXAk8AfM+i+sV9U1QZgA8CqVatqfHx8f73UrExMTLBY2jLX+h7b5Xc8PW29LeeM7//GzKG+n7O+xiZJ6q+ZdP94O/BwVX2nqn4IfAp4K7CkdQcBOAbY1pa3AccCtO2HAd8bLt/LPpIkSdLImklS/W3g5CQvb32jTwHuB74IvKvVWQvc0pZvbeu07V+oqmrlZ7fRQZYDK4AvzU0YkiRJ0sKZSZ/qu5LcDHwF2A18lUH3jM8ANyT5UCu7pu1yDfCJJJuBHQxG/KCq7ktyE4OEfDdwYVVN39lVkiRJWuSmTaoBquoS4JI9ih9iL6N3VNUPgHdPcZxLgUtn2UZJkiRpUXNGRUmSJKkjk2pJkiSpI5NqSZIkqSOTakmSJKkjk2pJkiSpI5NqSRJJfiXJfUm+keSTSV6aZHmSu5JsTnJjkkNa3Ze09c1t+7Kh41zcyr+V5LQFC0iS5plJtSQd4JIsBX4ZWFVVbwAOYjDHwIeBK6rqOGAncEHb5QJgZyu/otUjyfFtv9cDq4GPJTloPmORpIViUi1JgsG8BS9LcjDwcmA78Dbg5rZ9I3BmW17T1mnbT2kz7q4BbqiqZ6vqYWAze5nPQJL6aEaTv0iS+quqtiX5CPBt4L8DnwfuAZ6sqt2t2lZgaVteCjza9t2d5CngyFZ+59Chh/f5MUnWAesAxsbGmJiYmFWbx14G61fufsE6sz3mYrFr166RbftM9Dm+PscG/Y+vK5NqSTrAJTmcwV3m5cCTwB8z6L6x31TVBmADwKpVq2p8fHxW+3/0+lu4/N4X/gjbcs7sjrlYTExMMNt/j1HS5/j6HBv0P76u7P4hSXo78HBVfaeqfgh8CngrsKR1BwE4BtjWlrcBxwK07YcB3xsu38s+ktRrJtWSpG8DJyd5eesbfQpwP/BF4F2tzlrglrZ8a1unbf9CVVUrP7uNDrIcWAF8aZ5ikKQFZfcPSTrAVdVdSW4GvgLsBr7KoGvGZ4AbknyolV3TdrkG+ESSzcAOBiN+UFX3JbmJQUK+G7iwqp6b12AkaYGYVEuSqKpLgEv2KH6IvYzeUVU/AN49xXEuBS6d8wZK0iJn9w9JkiSpo2mT6iSvTfK1oZ/vJ3l/kiOSbEryYPt9eKufJFe1GbW+nuSEoWOtbfUfTLJ26leVJEmSRse0SXVVfauq3lRVbwL+IfAM8GngIuD2qloB3N7WAU5n8HDKCgZjkF4NkOQIBl8tnsTg68RLJhNxSZIkaZTNtvvHKcBfVdUj/PiMWnvOtHVdDdzJYEimo4HTgE1VtaOqdgKb2M/joEqSJEnzYbZJ9dnAJ9vyWFVtb8uPAWNt+UczbTWTM2pNVS5JkiSNtBmP/pHkEODngYv33FZVlaTmokFdp67dX/o8NWffY1u/cvoRvUYt/r6fs77GJknqr9kMqXc68JWqerytP57k6Kra3rp3PNHKp5pRaxswvkf5xJ4v0nXq2v2lz1Nz9j22y+94etp6ozadcd/PWV9jkyT112y6f/wCz3f9gB+fUWvPmbbObaOAnAw81bqJfA44Ncnh7QHFU1uZJEmSNNJmdKc6yaHAzwH/21DxZcBNSS4AHgHOauW3AWcAmxmMFHI+QFXtSPJB4O5W7wNVtaNzBJIkSdICm1FSXVVPA0fuUfY9BqOB7Fm3gAunOM61wLWzb6YkSZK0eDmjoiRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktTRjJLqJEuS3Jzkm0keSPKWJEck2ZTkwfb78FY3Sa5KsjnJ15OcMHScta3+g0nW7q+gJEmSpPk00zvVVwKfrarXAW8EHgAuAm6vqhXA7W0d4HRgRftZB1wNkOQI4BLgJOBE4JLJRFySJEkaZdMm1UkOA34WuAagqv62qp4E1gAbW7WNwJlteQ1wXQ3cCSxJcjRwGrCpqnZU1U5gE7B6DmORJEmSFsTBM6izHPgO8IdJ3gjcA7wPGKuq7a3OY8BYW14KPDq0/9ZWNlX5j0myjsEdbsbGxpiYmJhpLPvVrl27Fk1b5lrfY1u/8rlp641a/H0/Z32NTZLUXzNJqg8GTgDeW1V3JbmS57t6AFBVlaTmokFVtQHYALBq1aoaHx+fi8N2NjExwWJpy1zre2yX3/H0tPW2nDO+/xszh/p+zvoamySpv2bSp3orsLWq7mrrNzNIsh9v3Tpov59o27cBxw7tf0wrm6pckiRJGmnTJtVV9RjwaJLXtqJTgPuBW4HJETzWAre05VuBc9soICcDT7VuIp8DTk1yeHtA8dRWJkmSJI20mXT/AHgvcH2SQ4CHgPMZJOQ3JbkAeAQ4q9W9DTgD2Aw80+pSVTuSfBC4u9X7QFXtmJMoJEmSpAU0o6S6qr4GrNrLplP2UreAC6c4zrXAtbNonyRJkrToOaOiJEmS1JFJtSRJktSRSbUkiSRLktyc5JtJHkjyliRHJNmU5MH2+/BWN0muSrI5ydeTnDB0nLWt/oNJ1k79ipLULybVkiSAK4HPVtXrgDcCDzCYk+D2qloB3M7zcxScDqxoP+uAqwGSHAFcApwEnAhcMpmIS1LfmVRL0gEuyWHAzwLXAFTV31bVk8AaYGOrthE4sy2vAa6rgTuBJW2+gtOATVW1o6p2ApuA1fMWiCQtoJkOqSdJ6q/lwHeAP0zyRuAe4H3AWJtnAOAxYKwtLwUeHdp/ayubqvwnJFnH4C43Y2Njs56afuxlsH7l7hesM6rT3e/atWtk2z4TfY6vz7FB/+PryqRaknQwg5ly31tVdyW5kue7egCD4VKT1Fy9YFVtADYArFq1qmY7Nf1Hr7+Fy+994Y+wLefM7piLxcTEBLP99xglfY6vz7FB/+Pryu4fkqStwNaququt38wgyX68deug/X6ibd8GHDu0/zGtbKpySeo9k2pJOsBV1WPAo0le24pOAe4HbgUmR/BYC9zSlm8Fzm2jgJwMPNW6iXwOODXJ4e0BxVNbmST1nt0/JEkA7wWuT3II8BBwPoMbLzcluQB4BDir1b0NOAPYDDzT6lJVO5J8ELi71ftAVe2YvxAkaeGYVEuSqKqvAav2sumUvdQt4MIpjnMtcO2cNk6SRoDdPyRJkqSOTKolSZKkjkyqJUmSpI5MqiVJkqSOTKolSZKkjmaUVCfZkuTeJF9L8uVWdkSSTUkebL8Pb+VJclWSzUm+nuSEoeOsbfUfTLJ2qteTJEmSRsls7lT/k6p6U1VNDrl0EXB7Va0Abuf5KW1PB1a0n3XA1TBIwoFLgJOAE4FLJhNxSZIkaZR16f6xBtjYljcCZw6VX1cDdwJL2vS2pwGbqmpHVe0ENgGrO7y+JEmStCjMdPKXAj6fpID/UFUbgLE2LS3AY8BYW14KPDq079ZWNlX5j0myjsEdbsbGxpiYmJhhE/evXbt2LZq2zLW+x7Z+5XPT1hu1+Pt+zvoamySpv2aaVP/jqtqW5O8Am5J8c3hjVVVLuDtrCfsGgFWrVtX4+PhcHLaziYkJFktb5lrfY7v8jqenrbflnPH935g51Pdz1tfYJEn9NaPuH1W1rf1+Avg0gz7Rj7duHbTfT7Tq24Bjh3Y/ppVNVS5JkiSNtGmT6iSHJnnl5DJwKvAN4FZgcgSPtcAtbflW4Nw2CsjJwFOtm8jngFOTHN4eUDy1lUmSJEkjbSbdP8aATyeZrP+fquqzSe4GbkpyAfAIcFarfxtwBrAZeAY4H6CqdiT5IHB3q/eBqtoxZ5FIkiRJC2TapLqqHgLeuJfy7wGn7KW8gAunONa1wLWzb6YkSZK0eDmjoiRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1NGMk+okByX5apI/a+vLk9yVZHOSG5Mc0spf0tY3t+3Lho5xcSv/VpLT5jwaSZIkaQHM5k71+4AHhtY/DFxRVccBO4ELWvkFwM5WfkWrR5LjgbOB1wOrgY8lOahb8yVJkqSFN6OkOskxwDuAP2jrAd4G3NyqbATObMtr2jpt+ymt/hrghqp6tqoeBjYDJ85BDJIkSdKCOniG9X4X+HXglW39SODJqtrd1rcCS9vyUuBRgKraneSpVn8pcOfQMYf3+ZEk64B1AGNjY0xMTMywifvXrl27Fk1b5lrfY1u/8rlp641a/H0/Z32NTZLUX9Mm1UneCTxRVfckGd/fDaqqDcAGgFWrVtX4+H5/yRmZmJhgsbRlrvU9tsvveHraelvOGd//jZlDfT9nfY1NktRfM7lT/Vbg55OcAbwUeBVwJbAkycHtbvUxwLZWfxtwLLA1ycHAYcD3hsonDe8jSZIkjaxp+1RX1cVVdUxVLWPwoOEXquoc4IvAu1q1tcAtbfnWtk7b/oWqqlZ+dhsdZDmwAvjSnEUiSZIkLZAu41T/BvCrSTYz6DN9TSu/Bjiylf8qcBFAVd0H3ATcD3wWuLCqpu/sKkmaFw6dKkn7bqYPKgJQVRPARFt+iL2M3lFVPwDePcX+lwKXzraRkqR5MTl06qva+uTQqTck+fcMhky9mqGhU5Oc3er98z2GTn0N8J+T/H1voEg6EDijoiTJoVMlqaNZ3amWJPXW7zJPQ6dC9+FTx14G61fufsE6ozo0Y9+HlexzfH2ODfofX1cm1ZJ0gJvvoVOh+/CpH73+Fi6/94U/wkZtqMxJfR9Wss/x9Tk26H98XZlUS5IcOlWSOrJPtSQd4Bw6VZK68061JGkqvwHckORDwFf58aFTP9GGTt3BIBGnqu5LMjl06m4cOlXSAcSkWpL0Iw6dKkn7xu4fkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHjv4hSeqlZRd9Zto6Wy57xzy0RNKBwDvVkiRJUkcm1ZIkSVJHJtWSJElSR9Mm1UlemuRLSf4yyX1JfruVL09yV5LNSW5Mckgrf0lb39y2Lxs61sWt/FtJTttvUUmSJEnzaCZ3qp8F3lZVbwTeBKxOcjLwYeCKqjoO2Alc0OpfAOxs5Ve0eiQ5HjgbeD2wGvhYkoPmMBZJkiRpQUybVNfArrb64vZTwNuAm1v5RuDMtrymrdO2n5IkrfyGqnq2qh4GNgMnzkUQkiRJ0kKa0ZB67Y7yPcBxwO8DfwU8WVW7W5WtwNK2vBR4FKCqdid5Cjiyld85dNjhfYZfax2wDmBsbIyJiYnZRbSf7Nq1a9G0Za71Pbb1K5+btt6oxd/3c9bX2CRJ/TWjpLqqngPelGQJ8GngdfurQVW1AdgAsGrVqhofH99fLzUrExMTLJa2zLW+x3b5HU9PW2/LOeP7vzFzqO/nrK+xSZL6a1ajf1TVk8AXgbcAS5JMJuXHANva8jbgWIC2/TDge8Ple9lHkiRJGlkzGf3jp9odapK8DPg54AEGyfW7WrW1wC1t+da2Ttv+haqqVn52Gx1kObAC+NIcxSFJkiQtmJl0/zga2Nj6Vb8IuKmq/izJ/cANST4EfBW4ptW/BvhEks3ADgYjflBV9yW5Cbgf2A1c2LqVSJIkSSNt2qS6qr4OvHkv5Q+xl9E7quoHwLunONalwKWzb6YkSZK0eDmjoiRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1NGMpimX+m7ZRZ+Zts6Wy94xDy2RJEmjyDvVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJHJtWSJElSRybVkiRJUkcm1ZIkSVJH0ybVSY5N8sUk9ye5L8n7WvkRSTYlebD9PryVJ8lVSTYn+XqSE4aOtbbVfzDJ2v0XliRJkjR/ZnKnejewvqqOB04GLkxyPHARcHtVrQBub+sApwMr2s864GoYJOHAJcBJwInAJZOJuCRJkjTKpk2qq2p7VX2lLf8N8ACwFFgDbGzVNgJntuU1wHU1cCewJMnRwGnApqraUVU7gU3A6rkMRpIkSVoIs+pTnWQZ8GbgLmCsqra3TY8BY215KfDo0G5bW9lU5ZIkSdJIO3imFZO8AvgT4P1V9f0kP9pWVZWk5qJBSdYx6DbC2NgYExMTc3HYznbt2rVo2jLX+h7b+pXPzcmxFtO/Ud/PWV9jW6ySHAtcx+DmSAEbqurK1m3vRmAZsAU4q6p2ZvABcCVwBvAMcN7kN5rteZl/0w79oaraiCQdAGaUVCd5MYOE+vqq+lQrfjzJ0VW1vXXveKKVbwOOHdr9mFa2DRjfo3xiz9eqqg3ABoBVq1bV+Pj4nlUWxMTEBIulLXOt77FdfsfTc3KsLeeMz8lx5kLfz1lfY1vEJp+d+UqSVwL3JNkEnMfg2ZnLklzE4NmZ3+DHn505icGzMycNPTuzikFyfk+SW1uXP0nqtZmM/hHgGuCBqvp3Q5tuBSZH8FgL3DJUfm4bBeRk4KnWTeRzwKlJDm8PKJ7ayiRJC8hnZySpu5ncqX4r8C+Be5N8rZX9JnAZcFOSC4BHgLPattsYfCW4mcHXgucDVNWOJB8E7m71PlBVO+YiCEnS3JivZ2e6dvUbexmsX7l7VvvszWLsatT3LlB9jq/PsUH/4+tq2qS6qu4AMsXmU/ZSv4ALpzjWtcC1s2mgJGl+zNezM+14nbr6ffT6W7j83hk/FjSlxdSta1Lfu0D1Ob4+xwb9j68rZ1SUJL3gszNt+0yfndlbuST1nkm1JB3gfHZGkrrr/t2ZJGnU+eyMJHVkUi1JBzifnZGk7uz+IUmSJHVkUi1JkiR1ZFItSZIkdWRSLUmSJHVkUi1JkiR15Ogf0hxZdtFnpq2z5bJ3zENLJEnSfPNOtSRJktSRSbUkSZLUkUm1JEmS1JFJtSRJktSRSbUkSZLUkUm1JEmS1NG0SXWSa5M8keQbQ2VHJNmU5MH2+/BWniRXJdmc5OtJThjaZ22r/2CStfsnHEmSJGn+zeRO9ceB1XuUXQTcXlUrgNvbOsDpwIr2sw64GgZJOHAJcBJwInDJZCIuSZIkjbppk+qq+gtgxx7Fa4CNbXkjcOZQ+XU1cCewJMnRwGnApqraUVU7gU38ZKIuSZIkjaR9nVFxrKq2t+XHgLG2vBR4dKje1lY2VflPSLKOwV1uxsbGmJiY2Mcmzq1du3YtmrbMtb7Htn7lc3NyrOn+jdav3N35GDPV93PW19i0+Ew3E6qzoEqaqc7TlFdVJam5aEw73gZgA8CqVatqfHx8rg7dycTEBIulLXOt77FdfsfTc3KsLeeMv+D282YyTfk0x5ipvp+zvsYmSeqvfR394/HWrYP2+4lWvg04dqjeMa1sqnJJkiRp5O1rUn0rMDmCx1rglqHyc9soICcDT7VuIp8DTk1yeHtA8dRWJkmSJI28abt/JPkkMA4clWQrg1E8LgNuSnIB8AhwVqt+G3AGsBl4BjgfoKp2JPkgcHer94Gq2vPhR2lRm67vpSRJOnBNm1RX1S9MsemUvdQt4MIpjnMtcO2sWidJkiSNAGdUlCRJkjoyqZYkSZI6MqmWJEmSOjKpliRJkjrqPPmLpMXJmeIkSZo/3qmWJEmSOjKpliRJkjqy+4ckSVOYyaRPdqWSBN6pliRJkjozqZYkSZI6MqmWJEmSOjKpliRJkjoyqZYkSZI6MqmWJEmSOjKpliRJkjpynGpJveGYwpKkhTLvSXWS1cCVwEHAH1TVZfPdBvXHdEnU+pW7WUz/d5yrpO/ebU9x3gyONR9MZLWnA+19fibXwEx4nUijbV6zjSQHAb8P/BywFbg7ya1Vdf98tkNazGbyAb1+5Tw0hLlLFnTg8H1e0oFqvm/hnQhsrqqHAJLcAKwBfLM9wJisaU+TfxPrV+6e8i68d/JGgu/z+2gm18BMTHed+O2StH+kqubvxZJ3Aaur6pfa+r8ETqqq9wzVWQesa6uvBb41bw18YUcB313oRuwnxjZ6+hoXLM7YfrqqfmqhGzEKZvI+38q7vtcvxr+TudLn2KDf8fU5Nuh/fK+tqlfu686Lp7NpU1UbgA0L3Y49JflyVa1a6HbsD8Y2evoaF/Q7Nj2v63t9n/9O+hwb9Du+PscGB0Z8Xfaf7yH1tgHHDq0f08okSf3g+7ykA9J8J9V3AyuSLE9yCHA2cOs8t0GStP/4Pi/pgDSv3T+qaneS9wCfYzDU0rVVdd98tqGDRdclZQ4Z2+jpa1zQ79h6bx7f5/v8d9Ln2KDf8fU5NjC+FzSvDypKkiRJfeQ05ZIkSVJHJtWSJElSRybVM5RkfZJKclRbT5KrkmxO8vUkJyx0G2cryb9N8s3W/k8nWTK07eIW27eSnLaAzdwnSVa3tm9OctFCt6eLJMcm+WKS+5Pcl+R9rfyIJJuSPNh+H77Qbd0XSQ5K8tUkf9bWlye5q527G9vDbhLQr2sb+n99Q7+v8SRLktzcPksfSPKWvpy7JL/S/ia/keSTSV46yucuybVJnkjyjaGyvZ6rfc3xTKpnIMmxwKnAt4eKTwdWtJ91wNUL0LSuNgFvqKp/APx/wMUASY5n8MT+64HVwMcymHp4JOT5aZJPB44HfqHFNKp2A+ur6njgZODCFs9FwO1VtQK4va2PovcBDwytfxi4oqqOA3YCFyxIq7To9PDahv5f39Dva/xK4LNV9TrgjQziHPlzl2Qp8MvAqqp6A4OHjs9mtM/dxxnkNMOmOlf7lOOZVM/MFcCvA8NPda4BrquBO4ElSY5ekNbto6r6fFXtbqt3MhhPFgax3VBVz1bVw8BmBlMPj4ofTZNcVX8LTE6TPJKqantVfaUt/w2DN+2lDGLa2KptBM5ckAZ2kOQY4B3AH7T1AG8Dbm5VRjIu7Te9urah39c39PsaT3IY8LPANQBV9bdV9SQ9OXcMRoh7WZKDgZcD2xnhc1dVfwHs2KN4qnO1TzmeSfU0kqwBtlXVX+6xaSnw6ND61lY2qv5X4M/b8qjHNurtn1KSZcCbgbuAsara3jY9BowtVLs6+F0G/2H9H239SODJof/s9ebcaU709tqGXl7f0O9rfDnwHeAPW/eWP0hyKD04d1W1DfgIg2/otwNPAffQn3M3aapztU/vNSbVQJL/3PoM7fmzBvhN4P9c6Dbuq2lim6zzWwy+grx+4Vqq6SR5BfAnwPur6vvD22owNuZIjY+Z5J3AE1V1z0K3RVpofbu+4YC4xg8GTgCurqo3A0+zR1ePET53hzO4W7sceA1wKD/ZdaJX5uJczevkL4tVVb19b+VJVjL4g/rLwTdWHAN8JcmJjMhUvFPFNinJecA7gVPq+UHLRyK2FzDq7f8JSV7M4AP3+qr6VCt+PMnRVbW9fS31xMK1cJ+8Ffj5JGcALwVexaB/4pIkB7e7ISN/7jSnendtQ2+vb+j/Nb4V2FpVd7X1mxkk1X04d28HHq6q7wAk+RSD89mXczdpqnO1T+813ql+AVV1b1X9napaVlXLGFxAJ1TVYwym3T23PSF6MvDU0FcIIyHJagZfy/18VT0ztOlW4OwkL0mynEFH/S8tRBv3Ua+mSW59EK8BHqiqfze06VZgbVteC9wy323roqourqpj2rV1NvCFqjoH+CLwrlZt5OLSftWraxv6e31D/6/xlgs8muS1regU4H56cO4YdPs4OcnL29/oZGy9OHdDpjpX+5TjOaPiLCTZwuBJ2O+2P7LfY/B1yDPA+VX15YVs32wl2Qy8BPheK7qzqv5V2/ZbDPpZ72bwdeSf7/0oi1O7M/K7PD9N8qUL26J9l+QfA/8FuJfn+yX+JoN+lzcBfxd4BDirqvZ8CGMkJBkH/nVVvTPJzzB4AO0I4KvAL1bVswvYPC0ifbq24cC4vqG/13iSNzF4CPMQ4CHgfAY3LEf+3CX5beCfM8gDvgr8EoN+xSN57pJ8EhgHjgIeBy4B/pS9nKt9zfFMqiVJkqSO7P4hSZIkdWRSLUmSJHVkUi1JkiR1ZFItSZIkdWRSLUmSJHVkUi1JkiR1ZFItSZIkdfT/A04laE5zMHRJAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df[(df['Quantity']>-50) & \n",
" (df['Quantity']<50) & \n",
" (df['UnitPrice']>0) & \n",
" (df['UnitPrice']<100)][['Quantity', 'UnitPrice']].hist(figsize=[12,4], bins=30)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:23.584264Z",
"iopub.status.busy": "2021-12-15T20:25:23.583784Z",
"iopub.status.idle": "2021-12-15T20:25:24.494000Z",
"shell.execute_reply": "2021-12-15T20:25:24.493618Z"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAs4AAAEICAYAAABPtXIYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAAsTAAALEwEAmpwYAAAeNUlEQVR4nO3df5Qd91nf8fcndpwEq5FJDIojuciJjKmxaEP22ElpqVwSkLEVAycNdkzBqbFOODWFVoUqQMuPklND40KMDRwRGyVgrLgmTaRYIaGUbQgEcMyP2vGPIhwllnGsGBLBKjSJzNM/7iy5bLTauXvv3bt39v06x8d7Z+6deR7Nndlnn/nOTKoKSZIkSaf2jEkHIEmSJE0DC2dJkiSpBQtnSZIkqQULZ0mSJKkFC2dJkiSpBQtnSZIkqQULZ61JSeaSvGjScUiSIMnPJ/mPq3V50jwLZ62YJNcmuT/Jp5N8PMnPJlm/AuudTfKd/dOqal1VPdrM35vkx8cdhyR1VZJKsmXBtB9J8sttPl9Vr6+q/9x8bluSIydZ1ueapsenkvxOkpe3WZ40ShbOWhFJdgE/AXwfsB54GbAZeF+SZ04wNEnSdHh7Va0DvgT4APCOJFn4piSnrXhkWjMsnDV2SZ4L/Cjw3VX1a1X1uao6DLwGeBHw2oVd34UdhyS7k/xpkr9K8mCSb+6bd22SDyR5U5JPJvlIksuaeW8E/ilwS9OpuKWZXkm2JNkJXAN8fzP/QJLvS/KrC3K4Ocmbx/VvJEldNn9MT7IrydEkTyR5Xd/8vUl+PMmZwHuAFzbH5LkkL+xfVlV9Dngr8ALg+c1nfy7JwSTHgUtP8jvlyiR/lOQvm98l25vp65Pc1sTzeBODhbcWZeGslfCPgWcD7+ifWFVzwEHg61ss40/pFcDr6RXhv5zknL75lwCPAGcDPwncliRV9YPAbwE3NMMzblgQwx7gDuAnm/k7gF8Gtic5CyDJ6cBVwNsGylqS1O8F9I7hG4HrgFuTfHH/G6rqOHAZ8GfNMXldVf1Z/3uSPAu4Fnisqp5qJr8WeCPw9+h1o/vffzG94/f3AWcBXwscbmbvBU4AW4CX0Pt99HeG9kn9LJy1Es4GnqqqEyeZ9wS9026nVFX/var+rKr+pqreDvwJcHHfWz5aVb9QVU/T60ScA2xYTrBV9QTwfuBfNJO2N/Hft5zlSZIA+BzwY81Zx4PAHHDBAJ9/TZJPAY8BLwW+uW/eu6rqt5vfEf9vweeuA26vql9v5j9eVQ8n2QB8I/C9VXW8qo4CP0WvUSKd1OmTDkBrwlPA2UlOP0nxfE4z/5SSfDvw7+iNiwZYR68gn/fx+R+q6tPNsLd1Q8T8VuC7gF8Avg34pSGWJUld9zSw8HqVZ9Irluf9+YLfAZ9msOP0XVX1bYvMe+wUnzuX3tnNhb6sifGJvqHSz1hiWVrj7DhrJXwQ+AzwLf0Tk6yjd0puFjgOfFHf7Bf0ve/L6BWwNwDPr6qzgAeAL7goZBG1jPnvBL4qyUXAFfSGc0iSTu5jfL6xMe884KPLWNZSx+xBP/MY8OJFpn8GOLuqzmr+e25VfeUy1q81wsJZY1dVx+iNS/6ZJNuTPDPJZuAuet3mO4A/Ar4xyfOSvAD43r5FnEnvoPgJgOaCkosGCOFJehchtp7fnOq7G/gV4Per6mMDrE+S1pq3Az+UZFOSZyR5BbCD3nF0UE/Su+hvVLcrvQ14XZKva2LbmOQrmmF57wNuSvLcZt6Lk/yzEa1XHWThrBVRVT8J/ADwJuCvgI/Q6zC/orkY5JeAP6Z3wcb76B2E5z/7IHATvc71k8BW4LcHWP2bgVc3d9y4+STzbwMubO4N+s6+6W9t1uUwDUk6tR8DfofehXmfpHeR9jVV9cCgC6qqh4E7gUeb4/ILl/rMEsv7feB19MYvHwP+N71hGgDfDpwBPNjEfTe9IYTSSaVqOWdEpOE0XeMfA75mtXZzk/x94GHgBVX1l5OOR5IkTZYXB2oiquoXk5ygd6u6VVc4J3kGvYsR91k0S5IksOMsfYHmBvxP0ruoZXtVeYW1JEmycJYkSZLa8OJASZIkqYVVMcb57LPPrs2bN086DACOHz/OmWeeOekwxqKruXU1L+hubqs1r/vuu++pqlrySZZankGP9av1ezIq5je9upwbdDu/48eP8/DDDy/7WL8qCufNmzfzoQ99aNJhADA7O8u2bdsmHcZYdDW3ruYF3c1tteaVZDkPa1BLgx7rV+v3ZFTMb3p1OTfodn6zs7Nceumlyz7WO1RDkiRJamEshXOSM5N8KMkV41i+JEmStNJaFc5Jbk9yNMkDC6ZvT/JIkkNJdvfN+g/0HqcsSZIkdULbjvNeYHv/hCSnAbcClwEXAlcnuTDJK+k9uvLoCOOUJK0ynl2UtNa0ujiwqt6fZPOCyRcDh6rqUYAk+4ArgXXAmfSK6b9OcrCq/mbhMpPsBHYCbNiwgdnZ2eXmMFJzc3OrJpZR62puXc0LuptbV/OadkluB64AjlbVRX3TtwNvBk4D3lJVNzazPLsoaU0Z5q4aG4H+J6odAS6pqhsAklwLPHWyohmgqvYAewBmZmZqtVy92fUrSbuYW1fzgu7m1tW8OmAvcAvwtvkJfWcXX0nvOH9vkv30fgc8CDx75cOUpMkY2+3oqmrvuJYtSRq91XZ2setnJsxvenU5N+h2fnNzc0N9fpjC+XHg3L7Xm5pprSXZAezYsmXLEGFIksZoYmcXu35mwvymV5dzg27nN+wfBMMUzvcC5yc5j17BfBXw2kEWUFUHgAMzMzPXDxGHBMDm3feccv7hGy9foUiktaPN2cVhmiRL7dfgvi1p5bS9Hd2dwAeBC5IcSXJdVZ0AbgDeCzwE3FVVHx5k5Ul2JNlz7NixQeOWJK2Moc8uVtWBqtq5fv36kQYmSSut7V01rl5k+kHg4HJXbsdZkla9oc8uSlJX+MhtSRLg2UVJWsrY7qrRhhcHStLq4dlFSTq1iXacHfcmSZKkaeFQDUnSWDlUQ1JXWDhLksbKs4uSumKihbNdCEmSJE0LxzhLksbKJomkrnCohiRprGySSOoKC2dJkiSpBcc4S5IkSS04xlmSNFY2SSR1hUM1JEljZZNEUldYOEuSJEktWDhLkiRJLXhxoCRJktSCFwdKksbKJomkrnCohiRprGySSOoKC2dJkiSphdMnHYAkScPYvPueU84/fOPlKxSJpK6z4yxJkiS14F01JEmSpBa8q4YkaaxskkjqCodqSJLGyiaJpK6wcJYkSZJa8K4aWjOWuvIevPpekiQtzo6zJEmS1IKFsyRJktSChbMkSZLUgoWzJEmS1IIPQJEkSZJa8AEokqSxskkiqSscqiFJGiubJJK6wsJZkiRJasHCWZIkSWrBwlmSJElqwcJZkiRJauH0SQcgTZvNu+9Z8j2Hb7x8BSKRJEkryY6zJEmS1IKFsyRJktSChbMkSZLUgoWzJEmS1MLIC+ck/yDJzye5O8l3jXr5kiRJ0iS0KpyT3J7kaJIHFkzfnuSRJIeS7Aaoqoeq6vXAa4CvGX3IkqRJs0kiaS1q23HeC2zvn5DkNOBW4DLgQuDqJBc2814F3AMcHFmkkqSxskkiSafW6j7OVfX+JJsXTL4YOFRVjwIk2QdcCTxYVfuB/UnuAX7lZMtMshPYCbBhwwZmZ2eXlcCozc3NrZpYRq2ruc3ntWvriaGX1ebfp816RvXv3PVtplVnL3AL8Lb5CX1NklcCR4B7k+yvqgebJsl3Ab80gVhb897rkkZlmAegbAQe63t9BLgkyTbgW4BncYqOc1XtAfYAzMzM1LZt24YIZXRmZ2dZLbGMWldzm8/r2ha/HJdy+JptS76nzXraLKeNrm8zrS6rrUkyNzfHrq1PD5rGskziD7mu/wHZ5fy6nBt0O7+5ubmhPj/yJwdW1Sww2+a9SXYAO7Zs2TLqMCRJozGxJsns7Cw3feD4wAEvx6j+2B1E1/+A7HJ+Xc4Nup3fsH8QDFM4Pw6c2/d6UzOttao6AByYmZm5fog4JEkrzCaJpLVomML5XuD8JOfRK5ivAl47kqikCWkzFlJaY2ySSFKj7e3o7gQ+CFyQ5EiS66rqBHAD8F7gIeCuqvrwICtPsiPJnmPHjg0atyRpZfxtkyTJGfSaJPsnHJMkTUSrwrmqrq6qc6rqmVW1qapua6YfrKovr6oXV9UbB115VR2oqp3r168f9KOSpBGzSSJJpzbyiwMlSdOpqq5eZPpBhrgvv0M1JHXFyB+5PQi7EJIkSZoWEy2cHaohSd1nk0RSV0y0cJYkdZ9NEkld4VANSZIkqYWJXhzoBSPScNrcd/rwjZevQCTS4nwAiqSucKiGJGmsHKohqSssnCVJkqQWJjpUw9N3kqTVwGFPktrwdnSSpLHyQnBJXeFQDUnSWNkkkdQVFs6SJElSCxbOkiRJUgs+AEWSJElqwYsDJUljZZNEUlc4VEOSNFY2SSR1hYWzJEmS1IKFsyRJktSChbMkSZLUgnfVkCRJklrwrhqSpLGySSKpKxyqIUkaK5skkrrCwlmSJElqwcJZkiRJasHCWZIkSWrBwlmSJElqwcJZkiRJasHCWZIkSWrBB6BIkiRJLfgAFEnSWNkkkdQVDtWQJI2VTRJJXWHhLEmSJLVg4SxJkiS1YOEsSZIktXD6pAOQpEFt3n3Pku85fOPlKxCJJGktseMsSZIktWDhLEmSJLVg4SxJkiS14BhnSZJGxPH3UrdZOEtqxYJAkrTWjaVwTvJNwOXAc4Hbqup941iPJEmStFJaj3FOcnuSo0keWDB9e5JHkhxKshugqt5ZVdcDrwe+dbQhS5ImLck3JfmFJG9P8vWTjkeSVsIgHee9wC3A2+YnJDkNuBV4JXAEuDfJ/qp6sHnLDzXzJUmrXJLbgSuAo1V1Ud/07cCbgdOAt1TVjVX1TuCdSb4YeBPQ+TOLbYYrSeq21oVzVb0/yeYFky8GDlXVowBJ9gFXJnkIuBF4T1X9wcmWl2QnsBNgw4YNzM7ODh79GMzNza2aWEatq7nN57Vr64lJh/K3fuaOdy35nq0b1y/5nqW2WZucR7XNR7muYb+LK5n3GrMXGySStKhhxzhvBB7re30EuAT4buAVwPokW6rq5xd+sKr2AHsAZmZmatu2bUOGMhqzs7OsllhGbZpzO1WnZ9fWp7npA8eZtmtdD1+zbcn3/Mwd72pyW8zSObdZTxvXtrk4sOW6hv0ujjIWfd6oGyTN+5fdJJmbm2PX1qcHymEazP8bdLWZMa/L+XU5N+h2fnNzc0N9fiyVRlXdDNy81PuS7AB2bNmyZRxhSJKGt+wGCQzXJJmdnV3iD8fpNP9H3TQ3M9rocn5dzg26nd+wfxAM+wCUx4Fz+15vaqa1UlUHqmrn+vVLn7KWJK0eVXVzVb20ql6/WNE8L8mOJHuOHTu2UuFJ0lgMWzjfC5yf5LwkZwBXAfuHD0uStEoM1SABmySSumOQ29HdCXwQuCDJkSTXVdUJ4AbgvcBDwF1V9eEBlmkXQpJWNxskktRoXThX1dVVdU5VPbOqNlXVbc30g1X15VX14qp64yArtwshSavHOBokzXJtkkjqhOm6DYEkaWyq6upFph8EDg6x3APAgZmZmeuXuwxJWg2GHeM8FLsQkiRJmhYTLZwdqiFJ3WeTRFJXOFRDmpA2j+/dtXUFApHGzKEakrpiooWzD0CRVoc2RbwkSWudQzUkSWPlUA1JXTHRwlmS1H02SSR1hYWzJEmS1IK3o5MkSZJacIyzJGmsbJJI6gqHakiSxsomiaSu8D7OkiStoPnbP+7aeoJrF7kV5OEbL1/JkCS1ZMdZkiRJasGLAyVJkqQWvDhQkjRWNkkkdYVDNSRJY2WTRFJXWDhLkiRJLXhXDUkrxrsJSJKmmR1nSZIkqQULZ0mSJKmFiQ7VSLID2LFly5ZJhqEJ27zIKXtJ3eCxXlJXTLRwrqoDwIGZmZnrJxmHJGl8PNYPbqmGgtcCSJPhUA1JkiSpBQtnSZIkqQULZ0mSJKkFC2dJkiSpBQtnSZIkqQULZ0mSJKmFiRbOSXYk2XPs2LFJhiFJkiQtaaKFc1UdqKqd69evn2QYkqQxskkiqSscqiFJGiubJJK6wsJZkiRJamGij9xW9y312FhJkqRpYcdZkiRJasHCWZIkSWrBoRqSJHVQm6Fyh2+8fAUikbrDjrMkSZLUgoWzJEmS1IJDNSRJWqMcziENxo6zJEmS1MLIC+ckL0pyW5K7R71sSZIkaVJaFc5Jbk9yNMkDC6ZvT/JIkkNJdgNU1aNVdd04gpUkrQ42SSStRW07znuB7f0TkpwG3ApcBlwIXJ3kwpFGJ0laMTZJJOnUWhXOVfV+4C8WTL4YONQcPD8L7AOuHHF8kqSVsxebJJK0qFRVuzcmm4F3V9VFzetXA9ur6jub1/8SuAT4YeCNwCuBt1TVf1lkeTuBnQAbNmx46b59+4bLZETm5uZYt27dpMMYi0nkdv/jx8a+jg3PgSf/euyrmYhpy23rxvWnnD//fThVXksto385w8RyMpdeeul9VTUz8Ac75CTH+pcDP1JV39C8fgPA/LE9yd1V9epTLG/Zx/q5uTk+cuzpZWay+g2zf49qPxnVuk7G36fTq8v5zc3NsWPHjmUf60d+O7qq+nPg9S3etwfYAzAzM1Pbtm0bdSjLMjs7y2qJZdQmkdu1LW51NKxdW09w0/3dvLPitOV2+Jptp5w//304VV5LLaN/OcPEotY2Ao/1vT4CXJLk+fSaJC9J8obFmiTDHOtnZ2e56QPHlxv3qjfM/j2q/WRU6zoZf59Ory7nNzs7O9Tnh/mN/Dhwbt/rTc201pLsAHZs2bJliDA0KW3u/ympm9o2ScBjvaTuGOZ2dPcC5yc5L8kZwFXA/kEWUFUHqmrn+vXLOw0kSRq7oZskHusldUXb29HdCXwQuCDJkSTXVdUJ4AbgvcBDwF1V9eFBVp5kR5I9x46NfxysJGlZhm6SSFJXtBqqUVVXLzL9IHBwuSuvqgPAgZmZmeuXuwxJ0mg0TZJtwNlJjgA/XFW3JZlvkpwG3L6cJgkO1ZhaPpZb+rzpuepIkjRWNkkk6dRG/sjtQThUQ5IkSdNiooWzF4xIUvfZJJHUFRMtnCVJ3WeTRFJXWDhLkiRJLUz04kCvtJak7vNYP3o+gEqaDMc4S5LGymO9pK5wqIYkSZLUgoWzJEmS1IJjnKeMT3CSNG081kvqCsc4S5LGymO9pK5wqIYkSZLUgoWzJEmS1IKFsyRJktSCFwdKksbKY71GxQvkNWleHChJGiuP9ZK6wqEakiRJUgsWzpIkSVILFs6SJElSCxbOkiRJUgtTe1eNlbyy1qt4pXba7Ctae7yrhqSu8K4akqSx8lgvqSscqiFJkiS1YOEsSZIktWDhLEmSJLVg4SxJkiS1YOEsSZIktWDhLEmSJLVg4SxJkiS1MLUPQJEkTQeP9d13socf7dp6gmub6Sv5kDAfWqZx8gEokqSx8lgvqSscqiFJkiS1YOEsSZIktWDhLEmSJLVg4SxJkiS1YOEsSZIktWDhLEmSJLVg4SxJkiS1YOEsSZIktWDhLEmSJLVg4SxJkiS1cPqoF5jkTOBngc8Cs1V1x6jXIUmaLI/1ktaiVh3nJLcnOZrkgQXTtyd5JMmhJLubyd8C3F1V1wOvGnG8kqQx8VgvSafWdqjGXmB7/4QkpwG3ApcBFwJXJ7kQ2AQ81rzt6dGEKUlaAXvxWC9Ji0pVtXtjshl4d1Vd1Lx+OfAjVfUNzes3NG89Anyyqt6dZF9VXbXI8nYCOwE2bNjw0n379g0U+P2PH1vyPVs3rh9omQBzc3OsW7duRda1HG1iWcyG58CTf937eRTxDhPLKPXn1TVdze1UebX5bo5rn7z00kvvq6qZgT/YIavpWD83N8dHjnW3Ju/q/j1v0PxGte+vxLo2PAe+9Hkr83t/VAb5txv2GD0qS8W83Dpvx44dyz7WDzPGeSOf7zZA7yB6CXAzcEuSy4EDi324qvYAewBmZmZq27ZtA6382t33LPmew9cMtkyA2dlZFsYyrnUtR5tYFrNr6wluur+3yUcR7zCxjFJ/Xl3T1dxOlVeb7+Zq2ifXgIkd62dnZ7npA8eXEfJ06Or+PW/Q/Ea176/EunZtPcFrBqxbJm2Qf7thj9GjslTMy63zhjHyPbaqjgOva/PeJDuAHVu2bBl1GJKkMfJYL2ktGuZ2dI8D5/a93tRMa62qDlTVzvXrp+t0hyStIR7rJakxTOF8L3B+kvOSnAFcBewfTViSpFXCY70kNdreju5O4IPABUmOJLmuqk4ANwDvBR4C7qqqDw+y8iQ7kuw5dmx1XGQmSWuZx3pJOrVWY5yr6upFph8EDi535VV1ADgwMzNz/XKXIUkaDY/1knRqPnJbkiRJamGihbOn7ySp+zzWS+qKiRbOXmktSd3nsV5SV7R+cuBYg0g+AXx00nE0zgaemnQQY9LV3LqaF3Q3t9Wa15dV1ZdMOoiuWsaxfrV+T0bF/KZXl3ODbud3NnDmco/1q6JwXk2SfKirj9ztam5dzQu6m1tX89Jodf17Yn7Tq8u5QbfzGzY3Lw6UJEmSWrBwliRJklqwcP5CeyYdwBh1Nbeu5gXdza2reWm0uv49Mb/p1eXcoNv5DZWbY5wlSZKkFuw4S5IkSS1YOEuSJEktWDgvkGRXkkpydvM6SW5OcijJ/0ny1ZOOcRBJ/muSh5vY/0eSs/rmvaHJ65Ek3zDBMJctyfYm/kNJdk86nuVKcm6S30zyYJIPJ/meZvrzkvx6kj9p/v/Fk451uZKcluQPk7y7eX1ekt9rtt3bk5wx6Ri1enRl34a1sX9Dt/fxJGclubv5ffpQkpd3Zfsl+bfN9/KBJHcmefY0b7sktyc5muSBvmkn3VbLqfEsnPskORf4euBjfZMvA85v/tsJ/NwEQhvGrwMXVdVXAf8XeANAkguBq4CvBLYDP5vktIlFuQxNvLfS20YXAlc3eU2jE8CuqroQeBnwr5tcdgO/UVXnA7/RvJ5W3wM81Pf6J4CfqqotwCeB6yYSlVadju3bsDb2b+j2Pv5m4Neq6iuAf0gvz6nffkk2Av8GmKmqi4DT6NUG07zt9tKra/ottq0GrvEsnP+unwK+H+i/YvJK4G3V87vAWUnOmUh0y1BV76uqE83L3wU2NT9fCeyrqs9U1UeAQ8DFk4hxCBcDh6rq0ar6LLCPXl5Tp6qeqKo/aH7+K3oH5Y308nlr87a3At80kQCHlGQTcDnwluZ1gH8O3N28ZWpz01h0Zt+G7u/f0O19PMl64GuB2wCq6rNV9Sm6s/1OB56T5HTgi4AnmOJtV1XvB/5iweTFttXANZ6FcyPJlcDjVfXHC2ZtBB7re32kmTaN/hXwnubnLuTVhRy+QJLNwEuA3wM2VNUTzayPAxsmFdeQfpreH6V/07x+PvCpvj/qOrHtNDKd3Lehs/s3dHsfPw/4BPCLzVCUtyQ5kw5sv6p6HHgTvTPtTwDHgPvozrabt9i2GvhYs6YK5yT/sxnDs/C/K4EfAP7TpGNcjiXymn/PD9I7XXjH5CLVUpKsA34V+N6q+sv+edW7d+TU3T8yyRXA0aq6b9KxSJPUxf0b1sQ+fjrw1cDPVdVLgOMsGJYxrduvGet7Jb0/Dl4InMkXDnPolGG31ekjjGXVq6pXnGx6kq30vjR/3Du7xCbgD5JcDDwOnNv39k3NtFVjsbzmJbkWuAL4uvr8jbtXfV4tdCGHv5XkmfR+qd5RVe9oJj+Z5JyqeqI5fXR0chEu29cAr0ryjcCzgefSGy94VpLTm67GVG87jVyn9m3o9P4N3d/HjwBHqur3mtd30yucu7D9XgF8pKo+AZDkHfS2Z1e23bzFttXAx5o11XFeTFXdX1VfWlWbq2ozvZ3kq6vq48B+4NubKy9fBhzra/evekm20zt99qqq+nTfrP3AVUmeleQ8egPjf38SMQ7hXuD85urfM+hd0LB/wjEtSzMe8Dbgoar6b32z9gPf0fz8HcC7Vjq2YVXVG6pqU7NvXQX8r6q6BvhN4NXN26YyN41NZ/Zt6Pb+Dd3fx5ta4LEkFzSTvg54kG5sv48BL0vyRc33dD63Tmy7Pottq4FrPJ8ceBJJDtO7wvSp5ot0C71TF58GXldVH5pkfINIcgh4FvDnzaTfrarXN/N+kN645xP0Th2+5+RLWb2aDsdP07sS+PaqeuNkI1qeJP8E+C3gfj4/RvAH6I2DvAv4+8BHgddU1cKLHqZGkm3Av6+qK5K8iN5FX88D/hD4tqr6zATD0yrSlX0b1s7+Dd3dx5P8I3oXPp4BPAq8jl7zceq3X5IfBb6VXi3wh8B30hvnO5XbLsmdwDbgbOBJ4IeBd3KSbbWcGs/CWZIkSWrBoRqSJElSCxbOkiRJUgsWzpIkSVILFs6SJElSCxbOkiRJUgsWzpIkSVILFs6SJElSC/8fA6+SxudduSkAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df[(df['Quantity']>-50) & \n",
" (df['Quantity']<50) & \n",
" (df['UnitPrice']>0) & \n",
" (df['UnitPrice']<100)][['Quantity', 'UnitPrice']].hist(figsize=[12,4], bins=30, log=True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:24.504460Z",
"iopub.status.busy": "2021-12-15T20:25:24.504086Z",
"iopub.status.idle": "2021-12-15T20:25:26.468550Z",
"shell.execute_reply": "2021-12-15T20:25:26.466711Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Country | \n",
" CustomerID | \n",
" ... | \n",
" StockCode | \n",
" UnitPrice | \n",
"
\n",
" \n",
" \n",
" \n",
" 46 | \n",
" United Kingdom | \n",
" 13748.0 | \n",
" ... | \n",
" 22086 | \n",
" 2.55 | \n",
"
\n",
" \n",
" 83 | \n",
" United Kingdom | \n",
" 15291.0 | \n",
" ... | \n",
" 21733 | \n",
" 2.55 | \n",
"
\n",
" \n",
" 96 | \n",
" United Kingdom | \n",
" 14688.0 | \n",
" ... | \n",
" 21212 | \n",
" 0.42 | \n",
"
\n",
" \n",
" 102 | \n",
" United Kingdom | \n",
" 14688.0 | \n",
" ... | \n",
" 85071B | \n",
" 0.38 | \n",
"
\n",
" \n",
" 176 | \n",
" United Kingdom | \n",
" 16029.0 | \n",
" ... | \n",
" 85099C | \n",
" 1.65 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 14784 | \n",
" United Kingdom | \n",
" 15061.0 | \n",
" ... | \n",
" 22423 | \n",
" 10.95 | \n",
"
\n",
" \n",
" 14785 | \n",
" United Kingdom | \n",
" 15061.0 | \n",
" ... | \n",
" 22075 | \n",
" 1.45 | \n",
"
\n",
" \n",
" 14788 | \n",
" United Kingdom | \n",
" 15061.0 | \n",
" ... | \n",
" 17038 | \n",
" 0.07 | \n",
"
\n",
" \n",
" 14974 | \n",
" United Kingdom | \n",
" 14739.0 | \n",
" ... | \n",
" 21704 | \n",
" 0.72 | \n",
"
\n",
" \n",
" 14980 | \n",
" United Kingdom | \n",
" 14739.0 | \n",
" ... | \n",
" 22178 | \n",
" 1.06 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"258 rows × 8 columns
"
],
"text/plain": [
" Country CustomerID ... StockCode UnitPrice\n",
"46 United Kingdom 13748.0 ... 22086 2.55\n",
"83 United Kingdom 15291.0 ... 21733 2.55\n",
"96 United Kingdom 14688.0 ... 21212 0.42\n",
"102 United Kingdom 14688.0 ... 85071B 0.38\n",
"176 United Kingdom 16029.0 ... 85099C 1.65\n",
"... ... ... ... ... ...\n",
"14784 United Kingdom 15061.0 ... 22423 10.95\n",
"14785 United Kingdom 15061.0 ... 22075 1.45\n",
"14788 United Kingdom 15061.0 ... 17038 0.07\n",
"14974 United Kingdom 14739.0 ... 21704 0.72\n",
"14980 United Kingdom 14739.0 ... 22178 1.06\n",
"\n",
"[258 rows x 8 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('Quantity>50 & UnitPrice<100')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arithmetic Operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numeric values"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:26.483774Z",
"iopub.status.busy": "2021-12-15T20:25:26.482084Z",
"iopub.status.idle": "2021-12-15T20:25:26.907406Z",
"shell.execute_reply": "2021-12-15T20:25:26.906448Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 6\n",
"1 6\n",
"2 8\n",
"3 6\n",
"4 6\n",
"Name: Quantity, dtype: int64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Quantity'].head()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:26.912916Z",
"iopub.status.busy": "2021-12-15T20:25:26.910149Z",
"iopub.status.idle": "2021-12-15T20:25:27.361783Z",
"shell.execute_reply": "2021-12-15T20:25:27.362723Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 2.55\n",
"1 3.39\n",
"2 2.75\n",
"3 3.39\n",
"4 3.39\n",
"Name: UnitPrice, dtype: float64"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['UnitPrice'].head()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:27.383414Z",
"iopub.status.busy": "2021-12-15T20:25:27.374098Z",
"iopub.status.idle": "2021-12-15T20:25:27.387546Z",
"shell.execute_reply": "2021-12-15T20:25:27.388753Z"
}
},
"outputs": [],
"source": [
"product = df['Quantity'] * df['UnitPrice']"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:27.398754Z",
"iopub.status.busy": "2021-12-15T20:25:27.397557Z",
"iopub.status.idle": "2021-12-15T20:25:27.818022Z",
"shell.execute_reply": "2021-12-15T20:25:27.819640Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 15.30\n",
"1 20.34\n",
"2 22.00\n",
"3 20.34\n",
"4 20.34\n",
"dtype: float64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"product.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"String concatenation"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"execution": {
"iopub.execute_input": "2021-12-15T20:25:27.837007Z",
"iopub.status.busy": "2021-12-15T20:25:27.836370Z",
"iopub.status.idle": "2021-12-15T20:25:29.072872Z",
"shell.execute_reply": "2021-12-15T20:25:29.074153Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 United Kingdom85123A\n",
"1 United Kingdom71053\n",
"2 United Kingdom84406B\n",
"3 United Kingdom84029G\n",
"4 United Kingdom84029E\n",
" ... \n",
"14995 United Kingdom72349B\n",
"14996 United Kingdom72741\n",
"14997 United Kingdom22762\n",
"14998 United Kingdom21773\n",
"14999 United Kingdom22149\n",
"Length: 15000, dtype: object"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Country'] + df['StockCode']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 2
}