## Overlay Filesystems Demo

This notebook explores overlay filesystems and Dockers use of this feature. In the notebook you will build and run some Docker containers and explore how the layered filesystem specified by the Dockerfile are presented in the filesystem. 

### Preparation

This lab walks through the Docker overlay filesystem. In order to clearly observe the folders created here, the following command can be executed to reset the docker installation and remove all cached images. 

**Do not execute this on a production system or one containing data you cannot remove**


The following two commands restart docker, which will clear out any running containers to further reset the demo environment

In [None]:
sudo systemctl stop docker
sudo systemctl start docker

The following command prunes all docker cache and storage volumes. 

**This is a destructive command. Only use this on a testing system as this will remove data**

In [None]:
sudo docker system prune --all --force --volumes

# Exploring Docker's use of filesystems

Docker stores container filesystems under /var/lib/docker/overlay2. Before you run any containers, the folder contains two objects:

In [None]:
sudo ls /var/lib/docker/overlay2 -l

Lets download a simple, 1-layer, container and check how this is represented in the filesystem

In [None]:
docker pull amazonlinux:2

In [None]:
sudo ls /var/lib/docker/overlay2/ -l

Docker provides a metadata description of the container which is accessible via the inspect command. In the JSON document returned, the overlay filesystem is documented under the GraphDriver section. We can extract and verify this matches the folder shown above as follows:

In [None]:
docker inspect --format='{{.GraphDriver.Data.MergedDir}}' "amazonlinux:2"

In [None]:
FS_PATH=$(docker inspect --format='{{.GraphDriver.Data.MergedDir}}' "amazonlinux:2" | rev | cut -d/ -f2- | rev)
echo $FS_PATH

Lets explore the contents of the container folder:

In [None]:
sudo ls $FS_PATH -l

The diff folder contains the data stored in this layer of the filesystem. We can explore this like any other folder:

In [None]:
sudo ls $FS_PATH/diff

This looks like a normal linux root filesystem!

The other file present is a text file, link. We can read the contents of this file: 

In [None]:
sudo cat $FS_PATH/link

The link file maps back to a symlink stored within the 'l' folder in the root of the /var/lib/docker/overlay2/ folder, which in turn points back to the diff folder containing our container filesystem. 

This behavior is a Docker-specific implementation detail, and not something particular to the use of union/overlay filesystems:

In [None]:
sudo ls /var/lib/docker/overlay2/l -l

The other file in the root of the docker filesystem is another Docker specific implementation detail, the backingFsBlockDev file, which is a block device that maps to the root block device for the host OS:

In [None]:
sudo ls /var/lib/docker/overlay2/ -l
sudo lsblk

### Multiple Layers

In the layer-example folder, I have prepared a simple Dockerfile which presents a 3-layer filesystem:

- Layer 0: Base image (Amazon Linux 2)
- Layer 1: Adds a file: /hello
- Layer 2: Removes the file: /hello

Lets look at the Dockerfile:

In [None]:
cd layer-example
cat Dockerfile

Lets build the image:

In [None]:
docker build -t layer-example .

From the build log, you can see that Docker passed through three steps (one for each line in the Dockerfile) and created layers for each. 

Now lets refresh the Docker overlay2 folder to see what's changed:

In [None]:
sudo ls /var/lib/docker/overlay2/ -latr

Two new directories have been created, but it's not clear which is which layer in the fs. It's not obvious where these IDs have come from. 

We can find the IDs by going back to the 'docker inspect' command and pulling these from the GraphDriver section. 

The Docker metadata includes a LowerDir value:

In [None]:
LOWER_DIRS=$(docker inspect --format='{{.GraphDriver.Data.LowerDir}}' "layer-example")
echo $LOWER_DIRS

This value shows a heirarchy of the lower folders which layer from right to left (bottom layer in the filesystem is the last element in the list)

We can extract the middle layer (first element in the list) with some shell cut commands:

In [None]:
MIDDLE_DIR=$(echo $LOWER_DIRS | cut -d':' -f1 | rev | cut -d/ -f2- | rev)
echo $MIDDLE_DIR

In [None]:
sudo ls $MIDDLE_DIR -l

Lets explore the diff folder in the middle layer, which contains the changes made in this layer:

In [None]:
sudo ls $MIDDLE_DIR/diff -l

So this diff shows the creation of the hello file. This corresponds to line 2 in our Dockerfile. 

To validate this, lets check the 'lower' file which contains the ID of the layer that is below this one. 

In [None]:
sudo cat $MIDDLE_DIR/lower

Thats interesting, because this is the same value we saw earlier in the single layer example. This is the ID of our base amazonlinux:2 image and shows how Docker uses the filesystem to efficiently navigate through layers, as well as sharing them between based on the same image.

Next, lets explore the other newly created layer, which logically, should be the top. We can extract this from the Docker metadata via the UpperDir variable:

In [None]:
UPPER_DIR=$(docker inspect --format='{{.GraphDriver.Data.UpperDir}}' "layer-example" | rev | cut -d/ -f2- | rev)
echo $UPPER_DIR

In [None]:
sudo ls $UPPER_DIR

This looks very similar to the previous layer, but if we look at the lower file, we'll see the heirarchy:

In [None]:
sudo cat $UPPER_DIR/lower

So this is the top layer. In this layer, the hello file was removed. How does that work? 

In [None]:
sudo ls $UPPER_DIR/diff -l

So to wrap up, you can see the hello file was removed and this is expressed via the special 'c' flag in the permissions structure. This tells the overlay filesystem driver to present the unified filesystem with this file not present, however, as you can see from the above walk through, the original file exists and is stored on disk in the middle layer, it's just hidden by this tombstone that sits over the top. 