+++
title = "Step 1 - Execute a sequential Scan"
date = 2019-12-02T10:35:42-08:00
weight = 1
+++

First, execute a simple (sequential) `Scan` to calculate the total bytes sent for all records with `response code <> 200`. You can run a `Scan` with a filter expression to filter out unrelated records. The application worker will then sum the values of all the returned records where `response code <> 200`.

The Python file, `scan_logfile_simple.py`, includes the command to run a `Scan` with a filter expression and then calculate the sum of bytes sent.

The following code block scans the table.

```py
  fe = "responsecode <> :f"
  eav = {":f": 200}
  response = table.scan(
      FilterExpression=fe,
      ExpressionAttributeValues=eav,
      Limit=pageSize,
      ProjectionExpression='bytessent')
```

{{% notice info %}}
You can review the file on your own with `vim ~/workshop/scan_logfile_simple.py`. Type `:q` and hit enter to exit vim.
{{% /notice %}}

Notice that there is a `Limit` parameter set in the `Scan` command. A single `Scan` operation will read up to the maximum number of items set (if using the `Limit` parameter) or a maximum of 1 MB of data, and then apply any filtering to the results by using `FilterExpression`. If the total number of scanned items exceeds the maximum set by the limit parameter or the data set size limit of 1 MB, the scan stops and results are returned to the user as a `LastEvaluatedKey` value. This value can be used in a subsequent operation so that you can pick up where you left off.

In the following code, the `LastEvaluatedKey` value in the response is passed to the `Scan` method via the `ExclusiveStartKey` parameter.

```py
  while 'LastEvaluatedKey' in response:
    response = table.scan(
        FilterExpression=fe,
        ExpressionAttributeValues=eav,
        Limit=pageSize,
        ExclusiveStartKey=response['LastEvaluatedKey'],
        ProjectionExpression='bytessent')
    for i in response['Items']:
        totalbytessent += i['bytessent']
```

When the last page is returned, `LastEvaluatedKey` is not part of the response, so you know that the scan is complete.

Now, execute this code.

```bash
python scan_logfile_simple.py logfile_scan
```

**Parameters:** Tablename: `logfile_scan`

The output will look like the following.

```txt
Scanning 1 million rows of table logfile_scan to get the total of bytes sent
Total bytessent 6054250 in 16.325594425201416 seconds
```

Make a note of the time it took for the scan to complete. With this exercise's dataset, a parallel scan will complete faster than the sequential scan.