# Use Script Mode to train any TensorFlow script from GitHub in SageMaker

In this tutorial, you train a TensorFlow script in SageMaker using the new Script Mode Tensorflow Container.

For this example, you use [Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow), but you can use the same technique for other scripts or repositories. For example, [TensorFlow Model Zoo](https://github.com/tensorflow/models) and [TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks).

## Set up the environment
Let's start by creating a SageMaker session and specifying the following:
- The S3 bucket and prefix to use for training and model data. The bucket should be in the same region as the Notebook Instance, training instance(s), and hosting instance(s). This example uses the default bucket that a SageMaker `Session` creates.
- The IAM role that allows SageMaker services to access your data. For more information about using IAM roles in SageMaker, see [Amazon SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).


In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()

role = sagemaker.get_execution_role()

### Clone the repository
Run the following command to clone the repository that contains the example:

In [None]:
!git clone https://github.com/sherjilozair/char-rnn-tensorflow > /dev/null 2>&1

This repository includes a README.md with an overview of the project, requirements, and basic usage:

In [None]:
from IPython.display import display, Markdown, Latex
display(Markdown('char-rnn-tensorflow/README.md'))

### Get the data
For training data, use plain text versions of Sherlock Holmes stories.

In [None]:
!mkdir sherlock
!wget https://sherlock-holm.es/stories/plain-text/cnus.txt --force-directories --output-document=sherlock/input.txt

## Test locally

Script Mode is in a development phase. We need to construct an Estimator to be able to use it with this example, see [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) for more information.


In [None]:
import boto3
from sagemaker.estimator import Framework
from sagemaker.tensorflow import TensorFlow

class ScriptModeTensorFlow(Framework):
 """This class is temporary until the final version of Script Mode is released.
 """
 
 __framework_name__ = "tensorflow-scriptmode"
 
 create_model = TensorFlow.create_model
 
 def __init__(self, py_version='py3', **kwargs):
 super(ScriptModeTensorFlow, self).__init__(**kwargs)
 self.py_version = py_version
 self.image_name = None
 self.framework_version = '1.11'


Use [Local Mode](https://github.com/aws/sagemaker-python-sdk#local-mode) to run the script locally in the notebook instance before you run a SageMaker training job:

In [None]:
import os

hyperparameters = {'num_epochs': 1, 
 'data_dir': '/opt/ml/input/data/training',
 'save_dir': '/opt/ml/model'}

estimator = ScriptModeTensorFlow(entry_point='train.py',
 source_dir='char-rnn-tensorflow',
 train_instance_type='local', # Run in local mode
 train_instance_count=1,
 hyperparameters=hyperparameters,
 role=role)

estimator.fit({'training': 'file://%s' % os.path.join(os.getcwd(), 'sherlock')})

## How Script Mode executes the script in the container

The above cell downloads a Python 3 CPU container locally and simulates a SageMaker training job. When training starts, script mode installs the user script as a Python module. The module name matches the script name. In this case, **train.py** is transformed into a Python module named **train**.

After that, the Python interpreter executes the user module, passing **hyperparameters** as script arguments. The example above is executed as follows:
```bash
python -m train --num-epochs 1 --data-dir /opt/ml/input/data/training --save-dir /opt/ml/model
```

The **train** module consumes the hyperparameters using any argument parsing library. [The example we're using](https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/train.py#L11) uses the Python [argparse](https://docs.python.org/3/library/argparse.html) library:

```python
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# Data and model checkpoints directories
parser.add_argument('--data_dir', type=str, default='data/tinyshakespeare', help='data directory containing input.txt with training examples')
parser.add_argument('--save_dir', type=str, default='save', help='directory to store checkpointed models')
...
args = parser.parse_args()

```


Let's explain the values of **--data_dir** and **--save-dir**:

- **/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data is downloaded to this folder because **training** is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. 

- **/opt/ml/model** use this directory to save models, checkpoints, or any other data. Any data saved in this folder is saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.

### Reading additional information from the container

Often, a user script needs additional information from the container that is not available in ```hyperparameters```.
SageMaker containers write this information as **environment variables** that are available inside the script.

For example, the example above can read information about the **training** channel provided in the training job request by adding the environment variable `SM_CHANNEL_TRAINING` as the default value for the `--data_dir` argument:

```python
if __name__ == '__main__':
 parser = argparse.ArgumentParser()
 # reads input channels training and testing from the environment variables
 parser.add_argument('--data_dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
```

Script mode displays the list of available environment variables in the training logs. You can find the [entire list here](https://github.com/aws/sagemaker-containers/blob/master/README.md#environment-variables-full-specification).

# Training in SageMaker

After you test the training job locally, upload the dataset to an S3 bucket so SageMaker can access the data during training.


In [None]:
inputs = sagemaker_session.upload_data(path='sherlock', bucket=bucket, key_prefix='datasets/sherlock')

To train in SageMaker, change the estimator argument **train_instance_type** to any SageMaker ml instance available for training. For example:

In [None]:
estimator = ScriptModeTensorFlow(entry_point='train.py',
 source_dir='char-rnn-tensorflow',
 train_instance_type='ml.c4.xlarge', 
 train_instance_count=1,
 hyperparameters=hyperparameters,
 role=role)

estimator.fit({'training': inputs})

# Installing additional requirements

## Installing pip packages

Script Mode installs the contents of your `source_dir` folder in the container as a [Python package](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py#L100). You can include a [requirements.txt file in the root folder of your source_dir to install any pip dependencies](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_modules.py#L111). You can, for example, install the lastest version of TensorFlow in the container:

content of requirements.txt
```
tensorflow==1.11.0
```

# Installing apt-get packages and other dependencies
You can define a `setup.py` file in your `source_dir` folder to install other dependencies. The example below installs [TensorFlow for C](https://www.tensorflow.org/install/lang_c) in the container.

In [None]:
!mkdir tf_c

In [None]:
%%writefile tf_c/get-tf-c.sh

wget -q -t 3 https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-cpu-linux-x86_64-1.11.0.tar.gz
tar -xzvf libtensorflow-cpu-linux-x86_64-1.11.0.tar.gz -C /usr/local

ldconfig

gcc -I/usr/local/include -L/usr/local/lib hello_tf.c -ltensorflow -o hello_tf
cp hello_tf /usr/bin/

In [None]:
%%writefile tf_c/hello_tf.c

#include 
#include 

int main() {
 printf("Hello from TensorFlow C library version %s\n", TF_Version());
 return 0;
}

In [None]:
%%writefile tf_c/setup.py
from distutils.command.build_py import build_py as _build_py
from distutils.core import setup
import subprocess

class build_py(_build_py):
 def run(self):
 subprocess.check_output(['bash', './get-tf-c.sh'])

 super(build_py, self).run()


from setuptools import setup
setup(packages=[''],
 name="test",
 version='1.0.0',
 cmdclass={'build_py': build_py},
 include_package_data=True)

In [None]:
%%writefile tf_c/train_c.py

import subprocess

message = subprocess.check_output('hello_tf')
assert message == b'Hello from TensorFlow C library version 1.11.0\n'

In [None]:
estimator = ScriptModeTensorFlow(entry_point='train_c.py',
 source_dir='tf_c',
 train_instance_type='local', 
 train_instance_count=1,
 role=role)

estimator.fit({})