<div style='font-size:200%;font-weight:bold'>Title</div><br>

This skeleton notebook includes reference structure and stanzas, and tips to make notebook as
ergonomic (for both (co-)authors and readers) and camera-ready as possible.

**NOTE:**
- Best viewed using Jupyter Lab.
- The title is a styled sentence rather than `h1`, to prevent it being showed and numbered in TOC.

<font style='color:firebrick'>**NOTE:** this skeleton notebook is meant for reading. To run it,
please install additional dependencies imported in the second next cell which starts with line
`# Dependencies required`.</font>

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%load_ext autoreload
%autoreload 2

# Make sure my_nb_path is imported first (and when isort is used, it needs to be told).
import my_nb_path  # isort: skip
from my_nb_color import print, pprint, inspect

In [None]:
# Dependencies required
import ndpretty
import numpy as np
import pandas as pd
import sagemaker as sm
from IPython.display import Markdown
from loguru import logger
from smallmatter.ds import mask_df  # See: https://github.com/aws-samples/smallmatter-package/

# A few standard SageMaker's stanzas. Use type annotation to be verbose.
role: str = sm.get_execution_role()
sess = sm.Session()
region: str = sess.boto_session.region_name

# Global setup

This section contains Python variables that should be personalized such as:
- the name of Amazon S3 bucket and/or prefix may vary from one project member to another.
- the filename of the dataset to run.

We also show a pattern to automatically synchronize the Python variable to environment variables.
The idea is to centralized all changes to only this section, then you can safely run the remaining
cells without having to worry about outdated hardcoded values in the Python, `!`, and `%%` codes.

<details><summary style="font-size:60%">Note on heading</summary>

> This section starts with an `h1` heading. Thus, it will appears in the TOC as "*1. Global setup*".
</details><br>

In [None]:
####################################################################################################
# Change me
####################################################################################################
bucket_name = "my-bucket-name"
prefix_name = "some/prefix"
####################################################################################################


####################################################################################################
# Do not change the next lines, as they're derived and will be recomputed automatically.
####################################################################################################
s3_prefix = f"s3://{bucket_name}/{prefix_name}".rstrip("/")

# Synchronize Python variable and environment variable.
%set_env S3_PREFIX=$s3_prefix
%env S3_PREFIX_CELL_SCOPE=$s3_prefix

# Demonstrate the difference between %env and %set_env.
!echo $S3_PREFIX_CELL_SCOPE  # Should print s3://my-bucket-name/some/prefix
!echo $S3_PREFIX             # Should print s3://my-bucket-name/some/prefix

In [None]:
# Demonstrate the difference between %env and %set_env
!echo $S3_PREFIX             # Should print s3://my-bucket-name/some/prefix
!echo $S3_PREFIX_CELL_SCOPE  # Should print an empty string

Next cell demonstrates the benefit of having environment variables synchronized to Python variables.
By avoiding hardcoding, you can parameterized your `!` or `%%` commands, to avoid changing hardcoded
values scatterred throughout this notebook.

*You can also use a raw cell instead of a markdown cell, however do note raw cells may not be
rendered correctly outside of Jupyter Lab.*

```bash
####################################################################################################
# Demonstrate the benefit of avoiding hardcoding as much as possible.
####################################################################################################

# Whenever `bucket_name` and `prefix_name` are updated (i.e., variables are changed and their cell
# are executed), the next CLI is guaranteed to always list the updated Amazon S3 prefix.
!aws s3 ls --recursive $S3_PREFIX/


# SEGWAY: since we're talking about aws-cli, here're a few tricks to read Amazon S3 files without
#         having to first download and save those files to your local filesystem.

# Show the first few rows of a file in Amazon S3. NOTE: you can safely ignore the broken pipe error.
!aws s3 cp $S3_PREFIX/haha.txt - | head  

# List files in an archive
!aws s3 cp $S3_PREFIX/model.tar.gz - | tar -tzvf -
```

# Improved output

## Colored outputs

In [None]:
d = {"A" * 200, "B" * 200}
print("Colored:", d)
pprint("Colored and wrapped:", d)
display(d)

for f in (logger.debug, logger.info, logger.success, logger.error):
    f("Hello World!")

Colored: [1m{[0m[32m'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'[0m, [32m'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'[0m[1m}[0m
Colored and wrapped:
[1m{[0m
    [32m'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[0m
[32mAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[0m
[32mAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'[0m,
    [32m'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB[0m
[32mBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB[0m
[32mBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'[0m
[1m}[0m

[1m{[0m
    [32m'AAAAAAAAAAAAA

[32m2022-01-22 17:23:03.529[0m | [34m[1mDEBUG   [0m | [36m__main__[0m:[36m<module>[0m:[36m7[0m - [34m[1mHello World![0m
[32m2022-01-22 17:23:03.530[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m7[0m - [1mHello World![0m
[32m2022-01-22 17:23:03.531[0m | [32m[1mSUCCESS [0m | [36m__main__[0m:[36m<module>[0m:[36m7[0m - [32m[1mHello World![0m
[32m2022-01-22 17:23:03.532[0m | [31m[1mERROR   [0m | [36m__main__[0m:[36m<module>[0m:[36m7[0m - [31m[1mHello World![0m


## Dataframes

In [None]:
def mask_userid(df: pd.DataFrame) -> pd.DataFrame:
    return mask_df(df, cols=["userid"])


df_a = pd.DataFrame(
    {
        "a": [1, 2, 3],
        "b": [4, 5, 6],
    }
)
df_b = pd.DataFrame(
    {
        "userid": [1000, 2000, 3000],  # Illustration only. Usually read from somewhere.
        "pca_a": [0.1, 0.2, 0.3],
        "pca_b": [-0.3, 0.01, 0.7],
    }
)

display(
    Markdown(
        '### Plain dataframe\n**NOTE:** this also appears in TOC as "*2.2.1. Plain dataframe*"'
    ),
    df_a,
    Markdown(
        """### Masked dataframe
Sometime, we would like to version the output of this cell into the git repo, to help readers to
quickly see the shape of a dataframe.

However, when the dataframe contains sensitive values, care must be taken to
**<font style='color:firebrick;background-color:yellow'>NEVER</font>** version these values to git.
Otherwise, as you all know, once checked into the git history, it can be tedious and challenging to
undo the versioning.
"""
    ),
    mask_userid(df_b),
)

### Plain dataframe
**NOTE:** this also appears in TOC as "*2.2.1. Plain dataframe*"

Unnamed: 0,a,b
0,1,4
1,2,5
2,3,6


### Masked dataframe
Sometime, we would like to version the output of this cell into the git repo, to help readers to
quickly see the shape of a dataframe.

However, when the dataframe contains sensitive values, care must be taken to
**<font style='color:firebrick;background-color:yellow'>NEVER</font>** version these values to git.
Otherwise, as you all know, once checked into the git history, it can be tedious and challenging to
undo the versioning.


Unnamed: 0,userid,pca_a,pca_b
0,xxx,0.1,-0.3
1,xxx,0.2,0.01
2,xxx,0.3,0.7


## Pretty ndarray display

Example: display a 2D tensor.

In [None]:
# Affect globally
ndpretty.default()
np.random.rand(9, 9)

# NOTE: without ndpretty.default(), use this form:
# ndpretty.ndarray_html(np.random.rand(3, 4))

# NOTE: the rendered output won't persist.

# Summary

When this notebook should be versioned without output, do a *Clear All Outputs*.

When there're output to be version (like what this skeleton notebook does), consider to remove the
cell counts.

**Advance git tips:** selectively choose which hunk (i.e., portion) of unstaged change to stage
using `git add -i filename.ipynb`. For instance, you may edit this notebook on another machine with
a different Python version or environment name, and these minutae changes need not be committed.
Please refer to git documentation for more details. TLDR version: choose `5: patch`, then decide
what to do with each hunk. Best combined with `nbdiff` for a more intuitive diff view.

<details><summary style="font-size:60%">Footnote</summary>

> This skeleton notebook was ran through the
> [clr-nb-xcnt.sh](https://github.com/verdimrc/pyutil/blob/master/bin/clr-nb-xcnt.sh) bash script
> to clear its cell counts.
>
> DISCLAIMER: the script is provided on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
> OF ANY KIND, either express or implied, including, without limitation, any warranties or
> conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You
> are solely responsible for determining the appropriateness of using or redistributing the Work and
> assume any risks associated with Your exercise of permissions under this Apache License 2.0.
</details><br>