# Hello docker (for Data Scientists & Developers)

source : https://docs.docker.com/get-started/

---
### Docker 개념과 장점

Docker 컨테이너는 독립적으로 애플리케이션이나 프로세스를 실행할 수 있는 경량화된 컴퓨팅환경입니다. Docker를 이용할 때 다음과 같은 장점을 가질 수 있으며 이런 점 때문에 최근 가장 주목받는 컴퓨팅 실행환경으로 선택받고 있습니다.

- 유연성(Flexible): Even the most complex applications can be containerized.
- 경량(Lightweight): Containers leverage and share the host kernel, making them much more efficient in terms of system resources than virtual machines.
- 이식성(Portable): You can build locally, deploy to the cloud, and run anywhere.
- 느슨한 결합(Loosely coupled): Containers are highly self sufficient and encapsulated, allowing you to replace or upgrade one without disrupting others.
- 확장성(Scalable): You can increase and automatically distribute container replicas across a datacenter.
- 보안(Secure): Containers apply aggressive constraints and isolations to processes without any configuration required on the part of the user.

이런 장점들은 특히 머신러닝의 실행환경에서 더욱 가치를 발휘합니다. 왜냐하면,

- 머신러닝 코드에서 사용하게 되는 복잡한 dependency 관계를 자연스럽게 코드로 정의하게 되며, 동일한 실행환경을 언제든 반복적으로 재생성할 수 있습니다.
- 일반적으로 학습단계에 대규묘 병렬 컴퓨팅을 필요로 하지만 그 필요량과 시점을 예측하기 어렵습니다. 머신러닝과 같이 동적으로 리소스를 배정해야 하는 환경에서 보다 효율적으로 리소스를 관리할 수 있습니다. 
- 머신러닝의 배포단계에서 추가 작업량이 현저히 줄어들게 됩니다. 머신러닝의 응용환경 적용대상은 클라우드 서버에서부터 모바일, IoT Edge에 이르기까지 다양합니다. 이런 다양한 환경에 모델을 배포할 때 환경설정으로 위한 시행착오를 줄일 수 있습니다.
- 자동화된 재학습과 자동화된 배포구성시 더욱 용이하게 구성할 수 있습니다.

### Docker 구조

![](https://docs.docker.com/engine/images/architecture.svg)


### Docker 기본 명령

Docker는 많은 복잡하고 다양한 명령과 설정을 포함하고 있지만 인프라 운영자가 아닌 Data Scientist 또는 Developer의 관점에서 알아야 할 Docker 명령은 그리 많지 않습니다. 본 노트북에서는 예제를 통해 다음 기본 명령들이 어떤 기능을 하는지 이해하고 활용할 수 있는 것을 목적으로 합니다.

- docker build
- docker run
- docker pull/push
- docker image
- docker ps

---
### Docker 환경 점검

SageMaker 노트북 환경에는 이미 도커가 설치되어 있습니다.

In [1]:
!docker --version

Docker version 19.03.6-ce, build 369ce74


`docker images` 명령을 통해 docker image repository를 확인할 수 있습니다.

In [6]:
!docker images

REPOSITORY TAG IMAGE ID CREATED SIZE
308961792850.dkr.ecr.us-east-1.amazonaws.com/rmars latest 61ea9cc60ed1 3 days ago 779MB
rmars latest 61ea9cc60ed1 3 days ago 779MB
 a0948009e757 3 days ago 779MB
ubuntu 16.04 dfeff22e96ae 3 weeks ago 131MB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife latest 67ba4376c4ec 2 months ago 3.58GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife 7d4aedb40ec3 2 months ago 3.58GB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training 2.2.0-cpu-py37 7a8906b92f39 2 months ago 2.95GB
tensorflow/tensorflow latest-gpu-jupyter f0b0261fec71 3 months ago 3.3GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/torchserve v1 e8118c508b9d 3 months ago 2.75GB
torchserve v1 e8118c508b9d 3 months ago 2.75GB
ubuntu 18.04 2eb2d388e1a2 3 months ago 64.2MB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training 2.1-cpu-py3 eaca4ea179b1 4 months ago 2.11GB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference 2.1-cpu a24d00bf4158 4 months ago

`docker ps` 명령을 통해 컨터이너 실행환경의 프로세스를 확인할 수 있습니다.

In [5]:
!docker ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES


다음 셀의 명령은 도커 레지스트리에서 아래 이미지를 로컬로 복사하여 가져옵니다. 
- https://hub.docker.com/_/busybox

In [39]:
!docker pull busybox

Using default tag: latest
latest: Pulling from library/busybox

[1BDigest: sha256:a9286defaba7b3a519d585ba0e37d0b2cbee74ebfe590960b0b1d6a5e97d1e1d
Status: Downloaded newer image for busybox:latest
docker.io/library/busybox:latest


In [40]:
!docker images

REPOSITORY TAG IMAGE ID CREATED SIZE
hello_docker latest 0ce2e15a7669 18 minutes ago 444MB
 299cdfc7a02d 3 hours ago 444MB
 954bb26bbc65 3 hours ago 444MB
308961792850.dkr.ecr.us-east-1.amazonaws.com/rmars latest 61ea9cc60ed1 3 days ago 779MB
rmars latest 61ea9cc60ed1 3 days ago 779MB
 a0948009e757 3 days ago 779MB
ubuntu 16.04 dfeff22e96ae 3 weeks ago 131MB
busybox latest f0b02e9d092d 4 weeks ago 1.23MB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife latest 67ba4376c4ec 2 months ago 3.58GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife 7d4aedb40ec3 2 months ago 3.58GB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training 2.2.0-cpu-py37 7a8906b92f39 2 months ago 2.95GB
tensorflow/tensorflow latest-gpu-jupyter f0b0261fec71 3 months ago 3.3GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/torchserve v1 e8118c508b9d 3 months ago 2.75GB
torchserve v1 e8118c508b9d 3 months ago 2.75GB
ubuntu 18.04 2eb2d388e1a2 3 months ago 64.2MB
763104351884.dkr.ecr.us-east-1.amazonaws.com

In [42]:
!docker run busybox echo "hello docker!!"

hello docker!!


In [44]:
!docker ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e001d10ddebf busybox "echo 'hello docker!…" 12 seconds ago Exited (0) 11 seconds ago tender_maxwell
e49c9e1147bc busybox "sh" 51 seconds ago Exited (0) 50 seconds ago hardcore_hawking
1fe657c2326a hello_docker "python hello_docker…" 19 minutes ago Exited (0) 19 minutes ago determined_leakey
61536dd1b3c4 719c1148fe65 "python hello_docker…" 3 hours ago Exited (0) 3 hours ago kind_meninsky
70bd5030457b 719c1148fe65 "/bin/bash" 3 hours ago Exited (0) 3 hours ago friendly_lumiere
b6b2693318f1 299cdfc7a02d "/bin/bash" 3 hours ago Exited (0) 3 hours ago musing_rubin
a83b0b6d4ae3 719c1148fe65 "/bin/bash" 3 hours ago Exited (126) 3 hours ago hopeful_saha
4c9959a1e1d1 719c1148fe65 "/bin/bash" 3 hours ago Exited (130) 3 hours ago hopeful_shockley
0d5af16e7512 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.2.0-cpu-py37 "train" 2 months ago Exited (0) 2 months ago tmp1eoweitw_algo-1-cnr6r_1
7019aa611b4d 308961792850.dkr.ecr.u

---
### Docker build 

Dcokerfile을 이용하여 실행환경을 정의합니다.
- ubuntu 16.04 를 base image로 사용합니다.
- wget, python, nginx 등 추가 필요한 필요한 도구와 서비스를 정의합니다.
- 사용자 프로그램에서 실행할 dependency library를 정의합니다.
- 환경변수와 working directory 등을 구성합니다.
- 맨 아래줄에 사용자 정의 테스트 프로그램 파일을 복사하고 있습니다.

In [29]:
%%writefile Dockerfile
# Use the official image as a parent image.
FROM ubuntu:16.04

# Install tools and utilities 
RUN apt-get -y update && apt-get install -y --no-install-recommends \
 wget \
 python \
 nginx \
 ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# install python dependencies
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && \
 pip install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gevent gunicorn && \
 (cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && \
 rm -rf /root/.cache

# set env variables 
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

# Set the working directory.
WORKDIR /opt/program

# Copy the file from your host to your current location.
COPY hello_docker.py .


Overwriting Dockerfile


테스트용으로 사용할 간단한 파이썬 실행코드를 생성합니다. (Dockerfile에서 컨테이너 내부로 copy하여 실행할 파일)

In [35]:
%%writefile hello_docker.py
import numpy as np
import pandas as pd

x = [1,2,3]
pd.DataFrame(x)
print(x)
print('pandas library was installed and runs well!')

Overwriting hello_docker.py


방금 정의한 Dockerfile을 이용하여 도커이미지를 빌드합니다.

In [36]:
!docker build -t hello_docker .

Sending build context to Docker daemon 241.7kB
Step 1/8 : FROM ubuntu:16.04
 ---> dfeff22e96ae
Step 2/8 : RUN apt-get -y update && apt-get install -y --no-install-recommends wget python nginx ca-certificates && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 65f40e9b009e
Step 3/8 : RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py && pip install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gevent gunicorn && (cd /usr/local/lib/python2.7/dist-packages/scipy/.libs; rm *; ln ../../numpy/.libs/* .) && rm -rf /root/.cache
 ---> Using cache
 ---> d01b9b443467
Step 4/8 : ENV PYTHONUNBUFFERED=TRUE
 ---> Using cache
 ---> 9608f2802fef
Step 5/8 : ENV PYTHONDONTWRITEBYTECODE=TRUE
 ---> Using cache
 ---> 80240e5319f9
Step 6/8 : ENV PATH="/opt/program:${PATH}"
 ---> Using cache
 ---> 0c2ba8be566e
Step 7/8 : WORKDIR /opt/program
 ---> Using cache
 ---> dc21030805a7
Step 8/8 : COPY hello_docker.py .
 ---> 0ce2e15a7669
Successfully built 0ce2e15a7669
Successfully t

In [37]:
!docker images

REPOSITORY TAG IMAGE ID CREATED SIZE
hello_docker latest 0ce2e15a7669 1 second ago 444MB
 299cdfc7a02d 3 hours ago 444MB
 954bb26bbc65 3 hours ago 444MB
308961792850.dkr.ecr.us-east-1.amazonaws.com/rmars latest 61ea9cc60ed1 3 days ago 779MB
rmars latest 61ea9cc60ed1 3 days ago 779MB
 a0948009e757 3 days ago 779MB
ubuntu 16.04 dfeff22e96ae 3 weeks ago 131MB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife latest 67ba4376c4ec 2 months ago 3.58GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife 7d4aedb40ec3 2 months ago 3.58GB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training 2.2.0-cpu-py37 7a8906b92f39 2 months ago 2.95GB
tensorflow/tensorflow latest-gpu-jupyter f0b0261fec71 3 months ago 3.3GB
torchserve v1 e8118c508b9d 3 months ago 2.75GB
308961792850.dkr.ecr.us-east-1.amazonaws.com/torchserve v1 e8118c508b9d 3 months ago 2.75GB
ubuntu 18.04 2eb2d388e1a2 3 months ago 64.2MB
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training 2.1-cpu-py3 eaca4ea179b1 4 m

---
### Docker 실행

`docker run` 명령을 이용하여 방금 빌드한 이미지를 실행합니다.
- `hello_docker` 이미지를 실행하면서 `python hello_docker.py`명령을 실행하고 있습니다.

In [38]:
!docker run hello_docker python hello_docker.py

[1, 2, 3]
pandas library was installed and runs well!


In [34]:
!docker ps -a

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
61536dd1b3c4 hello_docker "python hello_docker…" 2 hours ago Exited (0) 2 hours ago kind_meninsky
70bd5030457b hello_docker "/bin/bash" 2 hours ago Exited (0) 2 hours ago friendly_lumiere
b6b2693318f1 299cdfc7a02d "/bin/bash" 2 hours ago Exited (0) 2 hours ago musing_rubin
a83b0b6d4ae3 hello_docker "/bin/bash" 2 hours ago Exited (126) 2 hours ago hopeful_saha
4c9959a1e1d1 hello_docker "/bin/bash" 2 hours ago Exited (130) 2 hours ago hopeful_shockley
0d5af16e7512 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.2.0-cpu-py37 "train" 2 months ago Exited (0) 2 months ago tmp1eoweitw_algo-1-cnr6r_1
7019aa611b4d 308961792850.dkr.ecr.us-east-1.amazonaws.com/hwlife "/bin/bash" 2 months ago Exited (0) 2 months ago zen_hypatia
7e38990c7e6d 7d4aedb40ec3 "/bin/bash" 2 months ago Exited (0) 2 months ago infallible_margulis
806886fd0b1b 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.1-cpu "serve" 3 months ag

---
### AWS ECR 연결

다음 shell script 코드는 AWS ECR에 "hello-docker"라는 이름의 레포지토리를 만들고 접속한 후 조금 전 생성한 이미지를 레포지토리에 push합니다. 

In [60]:
%%sh
# Get the account number associated with the current IAM credentials
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
image="hello-docker"

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image}:latest"
echo ${fullname}

# 1) If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image}" > /dev/null 2>&1
if [ $? -ne 0 ]
then
 aws ecr create-repository --repository-name "${image}" > /dev/null
fi

# 2) login to ecr
aws ecr get-login-password --region "${region}" | docker login --username AWS --password-stdin "${account}".dkr.ecr."${region}".amazonaws.com

# 3) docker push
docker push ${fullname}

308961792850.dkr.ecr.us-east-1.amazonaws.com/hello-docker:latest
Login Succeeded
The push refers to repository [308961792850.dkr.ecr.us-east-1.amazonaws.com/hello-docker]
b22e671e4096: Preparing
65975c96572d: Preparing
8c48a68852f0: Preparing
12597f08af5b: Preparing
9edaa71ce233: Preparing
62fdddf6a67c: Preparing
eff16de3ff64: Preparing
61727f5e6796: Preparing
62fdddf6a67c: Waiting
61727f5e6796: Waiting
b22e671e4096: Pushed
9edaa71ce233: Pushed
65975c96572d: Pushed
62fdddf6a67c: Pushed
eff16de3ff64: Pushed
12597f08af5b: Pushed
61727f5e6796: Pushed
8c48a68852f0: Pushed
latest: digest: sha256:650c378f5d2ac0548506eb645f6e73f6287ae3cbdfc1006f1b1cf1f680e188d2 size: 1988


https://docs.docker.com/engine/reference/commandline/login/#credentials-store



실행이 완료되면 AWS 콘솔의 [ECR](https://console.aws.amazon.com/ecr/repositories)로 이동하여 생성된 레포지토리와 push된 이미지를 확인합니다. 
