+++
title = "Step 6 - Preload the items for the table Scan exercise"
date = 2019-12-02T10:20:18-08:00
weight = 60
+++

{{% notice info %}}
_Reminder: All commands are executed in the shell console connected to the EC2 instance, not your local machine. (If you are not sure you can always validate going back to [step 1](/design-patterns/setup/step1.html))_
{{% /notice %}}

In the upcoming Exercise #2 we will discuss table scan and its best practices. In this step, let's populate the table with 1 million items in preparation for that exercise.

Run the command to create a new table:

```bash
aws dynamodb create-table --table-name logfile_scan \
--attribute-definitions AttributeName=PK,AttributeType=S AttributeName=GSI_1_PK,AttributeType=S AttributeName=GSI_1_SK,AttributeType=S \
--key-schema AttributeName=PK,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5000,WriteCapacityUnits=5000 \
--tags Key=workshop-design-patterns,Value=targeted-for-cleanup \
--global-secondary-indexes "IndexName=GSI_1,\
KeySchema=[{AttributeName=GSI_1_PK,KeyType=HASH},{AttributeName=GSI_1_SK,KeyType=RANGE}],\
Projection={ProjectionType=KEYS_ONLY},\
ProvisionedThroughput={ReadCapacityUnits=3000,WriteCapacityUnits=5000}"
```

This command will create a new table and one GSI with the following definition:

#### Table: logfile_scan

- Key schema: HASH
- Table RCU = 5000
- Table WCU = 5000
- GSI(s):
  - GSI_1 (3000 RCU, 5000 WCU) - _Allows for parallel or sequential scans of the access logs. Sorted by status code and timestamp._

| Attribute name (Type) | Special attribute? |                       Attribute use case                        | Sample attribute value |
| --------------------- | :----------------: | :-------------------------------------------------------------: | ---------------------: |
| PK (STRING)           |      Hash key      |             Holds the request id for the access log             |       _request#104009_ |
| GSI_1_PK (STRING)     |   GSI 1 hash key   |       A shard key, with values 0-N, to allow log searches       |              _shard#3_ |
| GSI_1_SK (STRING)     |   GSI 1 sort key   | Sorts the logs hierarchically, from status code -> date -> hour |    _200#2019-09-21#01_ |

Run the command to wait until the table becomes Active:

```bash
aws dynamodb wait table-exists --table-name logfile_scan
```

#### Populate the table

Run the following command to load the server logs data into the logfile_scan table. It will load 1,000,000 rows to the table.

```bash
nohup python load_logfile_parallel.py logfile_scan &
disown
```

`nohup` is used to run the process in the background, and `disown` allows the load to continue in case you are disconnected.

{{% notice note %}}
_The following command will take about ten minutes to complete. It will run in the background._
{{% /notice %}}

Run `pgrep -l python` to verify the script is loading data in the background.

```bash
pgrep -l python
```

Output:

```txt
3257 python
```

{{% notice note %}}
_The process id - the 4 digit number in the above example - will be different for everyone._
{{% /notice %}}

The script will continue to run in the background while you work on the next exercise.

**You have completed the SETUP!**