{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fine tune a PyTorch BERT model and deploy it with Elastic Inference on Amazon SageMaker\n", "\n", "Text classification is a technique for putting text into different categories and has a wide range of applications: email providers use text classification to detect to spam emails, marketing agencies use it for sentiment analysis of customer reviews, and moderators of discussion forums use it to detect inappropriate comments.\n", "\n", "In the past, data scientists used methods such as [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf), [word2vec](https://en.wikipedia.org/wiki/Word2vec), or [bag-of-words (BOW)](https://en.wikipedia.org/wiki/Bag-of-words_model) to generate features for training classification models. While these techniques have been very successful in many NLP tasks, they don't always capture the meanings of words accurately when they appear in different contexts. Recently, we see increasing interest in using Bidirectional Encoder Representations from Transformers (BERT) to achieve better results in text classification tasks, due to its ability more accurately encode the meaning of words in different contexts.\n", "\n", "BERT was trained on BookCorpus and English Wikipedia data, which contain 800 million words and 2,500 million words, respectively. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, one can quickly fine tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering. \n", "\n", "Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models. The SageMaker Python SDK provides open source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several different machine learning and deep learning frameworks.\n", "\n", "Our customers often ask for quick fine-tuning and easy deployment of their NLP models. Furthermore, customers prefer low inference latency and low model inference cost. [Amazon Elastic Inference](https://aws.amazon.com/machine-learning/elastic-inference) enables attaching GPU-powered inference acceleration to endpoints, reducing the cost of deep learning inference without sacrificing performance.\n", "\n", "This blog post demonstrates how to use Amazon SageMaker to fine tune a PyTorch BERT model and deploy it with Elastic Inference. This work is inspired by a post by [Chris McCormick and Nick Ryan](https://mccormickml.com/2019/07/22/BERT-fine-tuning).\n", "\n", "In this example, we walk through our dataset, the training process, and finally model deployment. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup\n", "\n", "To start, we import some Python libraries and initialize a SageMaker session, S3 bucket and prefix, and IAM role." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# need torch 1.3.1 for elastic inference\n", "!pip install torch\n", "!pip install transformers" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "import pandas as pd\n", "import sagemaker\n", "\n", "sagemaker_session = sagemaker.Session()\n", "\n", "bucket = sagemaker_session.default_bucket()\n", "prefix = \"sagemaker/DEMO-pytorch-bert\"\n", "\n", "role = sagemaker.get_execution_role()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Prepare training data\n", "\n", "We use Corpus of Linguistic Acceptability (CoLA) (https://nyu-mll.github.io/CoLA/), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. We download and unzip the data using the following code:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "if not os.path.exists(\"./cola_public_1.1.zip\"):\n", " !curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip\n", "if not os.path.exists(\"./cola_public/\"):\n", " !unzip cola_public_1.1.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get sentences and labels\n", "\n", "Let us take a quick look at our data. First we read in the training data. The only two columns we need are the sentence itself and its label. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\n", " \"./cola_public/raw/in_domain_train.tsv\",\n", " sep=\"\\t\",\n", " header=None,\n", " usecols=[1, 3],\n", " names=[\"label\", \"sentence\"],\n", ")\n", "sentences = df.sentence.values\n", "labels = df.label.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Printing out a few sentences shows us how sentences are labeled based on their grammatical completeness. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['The professor talked us.' 'We yelled ourselves hoarse.'\n", " 'We yelled ourselves.' 'We yelled Harry hoarse.'\n", " 'Harry coughed himself into a fit.']\n", "[0 1 0 0 1]\n" ] } ], "source": [ "print(sentences[20:25])\n", "print(labels[20:25])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then split the dataset for training and testing." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "train, test = train_test_split(df)\n", "train.to_csv(\"./cola_public/train.csv\", index=False)\n", "test.to_csv(\"./cola_public/test.csv\", index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we upload both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful function for uploading to Amazon S3:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "inputs_train = sagemaker_session.upload_data(\"./cola_public/train.csv\", bucket=bucket, key_prefix=prefix)\n", "inputs_test = sagemaker_session.upload_data(\"./cola_public/test.csv\", bucket=bucket, key_prefix=prefix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Run training" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training script\n", "\n", "We use the [PyTorch-Transformers library](https://pytorch.org/hub/huggingface_pytorch-transformers), which contains PyTorch implementations and pre-trained model weights for many NLP models, including BERT.\n", "\n", "Our training script should save model artifacts learned during training to a file path called `model_dir`, as stipulated by the SageMaker PyTorch image. Upon completion of training, model artifacts saved in `model_dir` will be uploaded to S3 by SageMaker and will become available in S3 for deployment.\n", "\n", "We save this script in a file named `train_deploy.py`, and put the file in a directory named `code/`. The full training script can be viewed under `code/`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36margparse\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mjson\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mlogging\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mos\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36msys\u001b[39;49;00m\n", "\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mnumpy\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36mnp\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mpandas\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36mpd\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdistributed\u001b[39;49;00m \u001b[34mas\u001b[39;49;00m \u001b[04m\u001b[36mdist\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mutils\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdata\u001b[39;49;00m\n", "\u001b[34mimport\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mutils\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdata\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdistributed\u001b[39;49;00m\n", "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtorch\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mutils\u001b[39;49;00m\u001b[04m\u001b[36m.\u001b[39;49;00m\u001b[04m\u001b[36mdata\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m DataLoader, RandomSampler, TensorDataset\n", "\u001b[34mfrom\u001b[39;49;00m \u001b[04m\u001b[36mtransformers\u001b[39;49;00m \u001b[34mimport\u001b[39;49;00m AdamW, BertForSequenceClassification, BertTokenizer\n", "\n", "logger = logging.getLogger(\u001b[31m__name__\u001b[39;49;00m)\n", "logger.setLevel(logging.DEBUG)\n", "logger.addHandler(logging.StreamHandler(sys.stdout))\n", "\n", "MAX_LEN = \u001b[34m64\u001b[39;49;00m \u001b[37m# this is the max length of the sentence\u001b[39;49;00m\n", "\n", "\u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33mLoading BERT tokenizer...\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", "tokenizer = BertTokenizer.from_pretrained(\u001b[33m\"\u001b[39;49;00m\u001b[33mbert-base-uncased\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, do_lower_case=\u001b[34mTrue\u001b[39;49;00m)\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mflat_accuracy\u001b[39;49;00m(preds, labels):\n", " pred_flat = np.argmax(preds, axis=\u001b[34m1\u001b[39;49;00m).flatten()\n", " labels_flat = labels.flatten()\n", " \u001b[34mreturn\u001b[39;49;00m np.sum(pred_flat == labels_flat) / \u001b[36mlen\u001b[39;49;00m(labels_flat)\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32m_get_train_data_loader\u001b[39;49;00m(batch_size, training_dir, is_distributed):\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mGet train data loader\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", "\n", " dataset = pd.read_csv(os.path.join(training_dir, \u001b[33m\"\u001b[39;49;00m\u001b[33mtrain.csv\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m))\n", " sentences = dataset.sentence.values\n", " labels = dataset.label.values\n", "\n", " input_ids = []\n", " \u001b[34mfor\u001b[39;49;00m sent \u001b[35min\u001b[39;49;00m sentences:\n", " encoded_sent = tokenizer.encode(sent, add_special_tokens=\u001b[34mTrue\u001b[39;49;00m)\n", " input_ids.append(encoded_sent)\n", "\n", " \u001b[37m# pad shorter sentences\u001b[39;49;00m\n", " input_ids_padded = []\n", " \u001b[34mfor\u001b[39;49;00m i \u001b[35min\u001b[39;49;00m input_ids:\n", " \u001b[34mwhile\u001b[39;49;00m \u001b[36mlen\u001b[39;49;00m(i) < MAX_LEN:\n", " i.append(\u001b[34m0\u001b[39;49;00m)\n", " input_ids_padded.append(i)\n", " input_ids = input_ids_padded\n", "\n", " \u001b[37m# mask; 0: added, 1: otherwise\u001b[39;49;00m\n", " attention_masks = []\n", " \u001b[37m# For each sentence...\u001b[39;49;00m\n", " \u001b[34mfor\u001b[39;49;00m sent \u001b[35min\u001b[39;49;00m input_ids:\n", " att_mask = [\u001b[36mint\u001b[39;49;00m(token_id > \u001b[34m0\u001b[39;49;00m) \u001b[34mfor\u001b[39;49;00m token_id \u001b[35min\u001b[39;49;00m sent]\n", " attention_masks.append(att_mask)\n", "\n", " \u001b[37m# convert to PyTorch data types.\u001b[39;49;00m\n", " train_inputs = torch.tensor(input_ids)\n", " train_labels = torch.tensor(labels)\n", " train_masks = torch.tensor(attention_masks)\n", "\n", " train_data = TensorDataset(train_inputs, train_masks, train_labels)\n", " \u001b[34mif\u001b[39;49;00m is_distributed:\n", " train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)\n", " \u001b[34melse\u001b[39;49;00m:\n", " train_sampler = RandomSampler(train_data)\n", " train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)\n", "\n", " \u001b[34mreturn\u001b[39;49;00m train_dataloader\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32m_get_test_data_loader\u001b[39;49;00m(test_batch_size, training_dir):\n", " dataset = pd.read_csv(os.path.join(training_dir, \u001b[33m\"\u001b[39;49;00m\u001b[33mtest.csv\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m))\n", " sentences = dataset.sentence.values\n", " labels = dataset.label.values\n", "\n", " input_ids = []\n", " \u001b[34mfor\u001b[39;49;00m sent \u001b[35min\u001b[39;49;00m sentences:\n", " encoded_sent = tokenizer.encode(sent, add_special_tokens=\u001b[34mTrue\u001b[39;49;00m)\n", " input_ids.append(encoded_sent)\n", "\n", " \u001b[37m# pad shorter sentences\u001b[39;49;00m\n", " input_ids_padded = []\n", " \u001b[34mfor\u001b[39;49;00m i \u001b[35min\u001b[39;49;00m input_ids:\n", " \u001b[34mwhile\u001b[39;49;00m \u001b[36mlen\u001b[39;49;00m(i) < MAX_LEN:\n", " i.append(\u001b[34m0\u001b[39;49;00m)\n", " input_ids_padded.append(i)\n", " input_ids = input_ids_padded\n", "\n", " \u001b[37m# mask; 0: added, 1: otherwise\u001b[39;49;00m\n", " attention_masks = []\n", " \u001b[37m# For each sentence...\u001b[39;49;00m\n", " \u001b[34mfor\u001b[39;49;00m sent \u001b[35min\u001b[39;49;00m input_ids:\n", " att_mask = [\u001b[36mint\u001b[39;49;00m(token_id > \u001b[34m0\u001b[39;49;00m) \u001b[34mfor\u001b[39;49;00m token_id \u001b[35min\u001b[39;49;00m sent]\n", " attention_masks.append(att_mask)\n", "\n", " \u001b[37m# convert to PyTorch data types.\u001b[39;49;00m\n", " train_inputs = torch.tensor(input_ids)\n", " train_labels = torch.tensor(labels)\n", " train_masks = torch.tensor(attention_masks)\n", "\n", " train_data = TensorDataset(train_inputs, train_masks, train_labels)\n", " train_sampler = RandomSampler(train_data)\n", " train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=test_batch_size)\n", "\n", " \u001b[34mreturn\u001b[39;49;00m train_dataloader\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mtrain\u001b[39;49;00m(args):\n", " is_distributed = \u001b[36mlen\u001b[39;49;00m(args.hosts) > \u001b[34m1\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m args.backend \u001b[35mis\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m \u001b[34mNone\u001b[39;49;00m\n", " logger.debug(\u001b[33m\"\u001b[39;49;00m\u001b[33mDistributed training - \u001b[39;49;00m\u001b[33m%s\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, is_distributed)\n", " use_cuda = args.num_gpus > \u001b[34m0\u001b[39;49;00m\n", " logger.debug(\u001b[33m\"\u001b[39;49;00m\u001b[33mNumber of gpus available - \u001b[39;49;00m\u001b[33m%d\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, args.num_gpus)\n", " device = torch.device(\u001b[33m\"\u001b[39;49;00m\u001b[33mcuda\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m \u001b[34mif\u001b[39;49;00m use_cuda \u001b[34melse\u001b[39;49;00m \u001b[33m\"\u001b[39;49;00m\u001b[33mcpu\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", "\n", " \u001b[34mif\u001b[39;49;00m is_distributed:\n", " \u001b[37m# Initialize the distributed environment.\u001b[39;49;00m\n", " world_size = \u001b[36mlen\u001b[39;49;00m(args.hosts)\n", " os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mWORLD_SIZE\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m] = \u001b[36mstr\u001b[39;49;00m(world_size)\n", " host_rank = args.hosts.index(args.current_host)\n", " os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mRANK\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m] = \u001b[36mstr\u001b[39;49;00m(host_rank)\n", " dist.init_process_group(backend=args.backend, rank=host_rank, world_size=world_size)\n", " logger.info(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mInitialized the distributed environment: \u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m%s\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m backend on \u001b[39;49;00m\u001b[33m%d\u001b[39;49;00m\u001b[33m nodes. \u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mCurrent host rank is \u001b[39;49;00m\u001b[33m%d\u001b[39;49;00m\u001b[33m. Number of gpus: \u001b[39;49;00m\u001b[33m%d\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " args.backend, dist.get_world_size(),\n", " dist.get_rank(), args.num_gpus\n", " )\n", "\n", " \u001b[37m# set the seed for generating random numbers\u001b[39;49;00m\n", " torch.manual_seed(args.seed)\n", " \u001b[34mif\u001b[39;49;00m use_cuda:\n", " torch.cuda.manual_seed(args.seed)\n", "\n", " train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed)\n", " test_loader = _get_test_data_loader(args.test_batch_size, args.test)\n", "\n", " logger.debug(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mProcesses \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m/\u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m (\u001b[39;49;00m\u001b[33m{:.0f}\u001b[39;49;00m\u001b[33m%\u001b[39;49;00m\u001b[33m) of train data\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.format(\n", " \u001b[36mlen\u001b[39;49;00m(train_loader.sampler),\n", " \u001b[36mlen\u001b[39;49;00m(train_loader.dataset),\n", " \u001b[34m100.0\u001b[39;49;00m * \u001b[36mlen\u001b[39;49;00m(train_loader.sampler) / \u001b[36mlen\u001b[39;49;00m(train_loader.dataset),\n", " )\n", " )\n", "\n", " logger.debug(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mProcesses \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m/\u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m (\u001b[39;49;00m\u001b[33m{:.0f}\u001b[39;49;00m\u001b[33m%\u001b[39;49;00m\u001b[33m) of test data\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.format(\n", " \u001b[36mlen\u001b[39;49;00m(test_loader.sampler),\n", " \u001b[36mlen\u001b[39;49;00m(test_loader.dataset),\n", " \u001b[34m100.0\u001b[39;49;00m * \u001b[36mlen\u001b[39;49;00m(test_loader.sampler) / \u001b[36mlen\u001b[39;49;00m(test_loader.dataset),\n", " )\n", " )\n", "\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mStarting BertForSequenceClassification\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " model = BertForSequenceClassification.from_pretrained(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mbert-base-uncased\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[37m# Use the 12-layer BERT model, with an uncased vocab.\u001b[39;49;00m\n", " num_labels=args.num_labels, \u001b[37m# The number of output labels--2 for binary classification.\u001b[39;49;00m\n", " output_attentions=\u001b[34mFalse\u001b[39;49;00m, \u001b[37m# Whether the model returns attentions weights.\u001b[39;49;00m\n", " output_hidden_states=\u001b[34mFalse\u001b[39;49;00m, \u001b[37m# Whether the model returns all hidden-states.\u001b[39;49;00m\n", " )\n", "\n", " model = model.to(device)\n", " \u001b[34mif\u001b[39;49;00m is_distributed \u001b[35mand\u001b[39;49;00m use_cuda:\n", " \u001b[37m# multi-machine multi-gpu case\u001b[39;49;00m\n", " model = torch.nn.parallel.DistributedDataParallel(model)\n", " \u001b[34melse\u001b[39;49;00m:\n", " \u001b[37m# single-machine multi-gpu case or single-machine or multi-machine cpu case\u001b[39;49;00m\n", " model = torch.nn.DataParallel(model)\n", " optimizer = AdamW(\n", " model.parameters(),\n", " lr=\u001b[34m2e-5\u001b[39;49;00m, \u001b[37m# args.learning_rate - default is 5e-5, our notebook had 2e-5\u001b[39;49;00m\n", " eps=\u001b[34m1e-8\u001b[39;49;00m, \u001b[37m# args.adam_epsilon - default is 1e-8.\u001b[39;49;00m\n", " )\n", "\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mEnd of defining BertForSequenceClassification\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[34mfor\u001b[39;49;00m epoch \u001b[35min\u001b[39;49;00m \u001b[36mrange\u001b[39;49;00m(\u001b[34m1\u001b[39;49;00m, args.epochs + \u001b[34m1\u001b[39;49;00m):\n", " total_loss = \u001b[34m0\u001b[39;49;00m\n", " model.train()\n", " \u001b[34mfor\u001b[39;49;00m step, batch \u001b[35min\u001b[39;49;00m \u001b[36menumerate\u001b[39;49;00m(train_loader):\n", " b_input_ids = batch[\u001b[34m0\u001b[39;49;00m].to(device)\n", " b_input_mask = batch[\u001b[34m1\u001b[39;49;00m].to(device)\n", " b_labels = batch[\u001b[34m2\u001b[39;49;00m].to(device)\n", " model.zero_grad()\n", "\n", " outputs = model(b_input_ids, token_type_ids=\u001b[34mNone\u001b[39;49;00m, attention_mask=b_input_mask, labels=b_labels)\n", " loss = outputs[\u001b[34m0\u001b[39;49;00m]\n", "\n", " total_loss += loss.item()\n", " loss.backward()\n", " torch.nn.utils.clip_grad_norm_(model.parameters(), \u001b[34m1.0\u001b[39;49;00m)\n", " \u001b[37m# modified based on their gradients, the learning rate, etc.\u001b[39;49;00m\n", " optimizer.step()\n", " \u001b[34mif\u001b[39;49;00m step % args.log_interval == \u001b[34m0\u001b[39;49;00m:\n", " logger.info(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33mTrain Epoch: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m [\u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m/\u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m (\u001b[39;49;00m\u001b[33m{:.0f}\u001b[39;49;00m\u001b[33m%\u001b[39;49;00m\u001b[33m)] Loss: \u001b[39;49;00m\u001b[33m{:.6f}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.format(\n", " epoch,\n", " step * \u001b[36mlen\u001b[39;49;00m(batch[\u001b[34m0\u001b[39;49;00m]),\n", " \u001b[36mlen\u001b[39;49;00m(train_loader.sampler),\n", " \u001b[34m100.0\u001b[39;49;00m * step / \u001b[36mlen\u001b[39;49;00m(train_loader),\n", " loss.item(),\n", " )\n", " )\n", "\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mAverage training loss: \u001b[39;49;00m\u001b[33m%f\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, total_loss / \u001b[36mlen\u001b[39;49;00m(train_loader))\n", "\n", " test(model, test_loader, device)\n", "\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mSaving tuned model.\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " model_2_save = model.module \u001b[34mif\u001b[39;49;00m \u001b[36mhasattr\u001b[39;49;00m(model, \u001b[33m\"\u001b[39;49;00m\u001b[33mmodule\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[34melse\u001b[39;49;00m model\n", " model_2_save.save_pretrained(save_directory=args.model_dir)\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mtest\u001b[39;49;00m(model, test_loader, device):\n", " model.eval()\n", " _, eval_accuracy = \u001b[34m0\u001b[39;49;00m, \u001b[34m0\u001b[39;49;00m\n", "\n", " \u001b[34mwith\u001b[39;49;00m torch.no_grad():\n", " \u001b[34mfor\u001b[39;49;00m batch \u001b[35min\u001b[39;49;00m test_loader:\n", " b_input_ids = batch[\u001b[34m0\u001b[39;49;00m].to(device)\n", " b_input_mask = batch[\u001b[34m1\u001b[39;49;00m].to(device)\n", " b_labels = batch[\u001b[34m2\u001b[39;49;00m].to(device)\n", "\n", " outputs = model(b_input_ids, token_type_ids=\u001b[34mNone\u001b[39;49;00m, attention_mask=b_input_mask)\n", " logits = outputs[\u001b[34m0\u001b[39;49;00m]\n", " logits = logits.detach().cpu().numpy()\n", " label_ids = b_labels.to(\u001b[33m\"\u001b[39;49;00m\u001b[33mcpu\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m).numpy()\n", " tmp_eval_accuracy = flat_accuracy(logits, label_ids)\n", " eval_accuracy += tmp_eval_accuracy\n", "\n", " logger.info(\u001b[33m\"\u001b[39;49;00m\u001b[33mTest set: Accuracy: \u001b[39;49;00m\u001b[33m%f\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, tmp_eval_accuracy)\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mmodel_fn\u001b[39;49;00m(model_dir):\n", " device = torch.device(\u001b[33m\"\u001b[39;49;00m\u001b[33mcuda\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m \u001b[34mif\u001b[39;49;00m torch.cuda.is_available() \u001b[34melse\u001b[39;49;00m \u001b[33m\"\u001b[39;49;00m\u001b[33mcpu\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m================ objects in model_dir ===================\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(os.listdir(model_dir))\n", " model = BertForSequenceClassification.from_pretrained(model_dir)\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m================ model loaded ===========================\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[34mreturn\u001b[39;49;00m model.to(device)\n", "\n", "\n", "\n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32minput_fn\u001b[39;49;00m(request_body, request_content_type):\n", " \u001b[33m\"\"\"An input_fn that loads a pickled tensor\"\"\"\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m request_content_type == \u001b[33m\"\u001b[39;49;00m\u001b[33mapplication/json\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m:\n", " data = json.loads(request_body)\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m================ input sentences ===============\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(data)\n", " \n", " \u001b[34mif\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(data, \u001b[36mstr\u001b[39;49;00m):\n", " data = [data]\n", " \u001b[34melif\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(data, \u001b[36mlist\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m \u001b[36mlen\u001b[39;49;00m(data) > \u001b[34m0\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(data[\u001b[34m0\u001b[39;49;00m], \u001b[36mstr\u001b[39;49;00m):\n", " \u001b[34mpass\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\n", " \u001b[34mraise\u001b[39;49;00m \u001b[36mValueError\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33mUnsupported input type. Input type can be a string or an non-empty list. \u001b[39;49;00m\u001b[33m\\\u001b[39;49;00m\n", "\u001b[33m I got \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.format(data))\n", " \n", " \u001b[37m#encoded = [tokenizer.encode(x, add_special_tokens=True) for x in data]\u001b[39;49;00m\n", " \u001b[37m#encoded = tokenizer(data, add_special_tokens=True) \u001b[39;49;00m\n", " \n", " \u001b[37m# for backward compatibility use the following way to encode \u001b[39;49;00m\n", " \u001b[37m# https://github.com/huggingface/transformers/issues/5580\u001b[39;49;00m\n", " input_ids = [tokenizer.encode(x, add_special_tokens=\u001b[34mTrue\u001b[39;49;00m) \u001b[34mfor\u001b[39;49;00m x \u001b[35min\u001b[39;49;00m data]\n", " \n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m================ encoded sentences ==============\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(input_ids)\n", "\n", " \u001b[37m# pad shorter sentence\u001b[39;49;00m\n", " padded = torch.zeros(\u001b[36mlen\u001b[39;49;00m(input_ids), MAX_LEN) \n", " \u001b[34mfor\u001b[39;49;00m i, p \u001b[35min\u001b[39;49;00m \u001b[36menumerate\u001b[39;49;00m(input_ids):\n", " padded[i, :\u001b[36mlen\u001b[39;49;00m(p)] = torch.tensor(p)\n", " \n", " \u001b[37m# create mask\u001b[39;49;00m\n", " mask = (padded != \u001b[34m0\u001b[39;49;00m)\n", " \n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m================= padded input and attention mask ================\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(padded, \u001b[33m'\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, mask)\n", "\n", " \u001b[34mreturn\u001b[39;49;00m padded.long(), mask.long()\n", " \u001b[34mraise\u001b[39;49;00m \u001b[36mValueError\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33mUnsupported content type: \u001b[39;49;00m\u001b[33m{}\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m.format(request_content_type))\n", " \n", "\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mpredict_fn\u001b[39;49;00m(input_data, model):\n", " device = torch.device(\u001b[33m\"\u001b[39;49;00m\u001b[33mcuda\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m \u001b[34mif\u001b[39;49;00m torch.cuda.is_available() \u001b[34melse\u001b[39;49;00m \u001b[33m\"\u001b[39;49;00m\u001b[33mcpu\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " model.to(device)\n", " model.eval()\n", "\n", " input_id, input_mask = input_data\n", " input_id = input_id.to(device)\n", " input_mask = input_mask.to(device)\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m============== encoded data =================\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(input_id, input_mask)\n", " \u001b[34mwith\u001b[39;49;00m torch.no_grad():\n", " y = model(input_id, attention_mask=input_mask)[\u001b[34m0\u001b[39;49;00m]\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m\"\u001b[39;49;00m\u001b[33m=============== inference result =================\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " \u001b[36mprint\u001b[39;49;00m(y)\n", " \u001b[34mreturn\u001b[39;49;00m y\n", "\n", "\u001b[34mif\u001b[39;49;00m \u001b[31m__name__\u001b[39;49;00m == \u001b[33m\"\u001b[39;49;00m\u001b[33m__main__\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m:\n", " parser = argparse.ArgumentParser()\n", "\n", " \u001b[37m# Data and model checkpoints directories\u001b[39;49;00m\n", " parser.add_argument(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33m--num_labels\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=\u001b[34m2\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mN\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33minput batch size for training (default: 64)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n", " )\n", "\n", " parser.add_argument(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33m--batch-size\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=\u001b[34m64\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mN\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33minput batch size for training (default: 64)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n", " )\n", " parser.add_argument(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33m--test-batch-size\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=\u001b[34m1000\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mN\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33minput batch size for testing (default: 1000)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\n", " )\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--epochs\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=\u001b[34m2\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mN\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33mnumber of epochs to train (default: 10)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--lr\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mfloat\u001b[39;49;00m, default=\u001b[34m0.01\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mLR\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33mlearning rate (default: 0.01)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--momentum\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mfloat\u001b[39;49;00m, default=\u001b[34m0.5\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mM\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33mSGD momentum (default: 0.5)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--seed\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=\u001b[34m1\u001b[39;49;00m, metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mS\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, help=\u001b[33m\"\u001b[39;49;00m\u001b[33mrandom seed (default: 1)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m)\n", " parser.add_argument(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33m--log-interval\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m,\n", " default=\u001b[34m50\u001b[39;49;00m,\n", " metavar=\u001b[33m\"\u001b[39;49;00m\u001b[33mN\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " help=\u001b[33m\"\u001b[39;49;00m\u001b[33mhow many batches to wait before logging training status\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " )\n", " parser.add_argument(\n", " \u001b[33m\"\u001b[39;49;00m\u001b[33m--backend\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " \u001b[36mtype\u001b[39;49;00m=\u001b[36mstr\u001b[39;49;00m,\n", " default=\u001b[34mNone\u001b[39;49;00m,\n", " help=\u001b[33m\"\u001b[39;49;00m\u001b[33mbackend for distributed training (tcp, gloo on cpu and gloo, nccl on gpu)\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m,\n", " )\n", "\n", " \u001b[37m# Container environment\u001b[39;49;00m\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--hosts\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mlist\u001b[39;49;00m, default=json.loads(os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_HOSTS\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m]))\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--current-host\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mstr\u001b[39;49;00m, default=os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_CURRENT_HOST\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--model-dir\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mstr\u001b[39;49;00m, default=os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_MODEL_DIR\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--data-dir\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mstr\u001b[39;49;00m, default=os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_CHANNEL_TRAINING\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--test\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mstr\u001b[39;49;00m, default=os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_CHANNEL_TESTING\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\n", " parser.add_argument(\u001b[33m\"\u001b[39;49;00m\u001b[33m--num-gpus\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m, \u001b[36mtype\u001b[39;49;00m=\u001b[36mint\u001b[39;49;00m, default=os.environ[\u001b[33m\"\u001b[39;49;00m\u001b[33mSM_NUM_GPUS\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m])\n", "\n", " train(parser.parse_args())\n" ] } ], "source": [ "!pygmentize code/train_deploy.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train on Amazon SageMaker\n", "\n", "We use Amazon SageMaker to train and deploy a model using our custom PyTorch code. The Amazon SageMaker Python SDK makes it easier to run a PyTorch script in Amazon SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information on how to use this SDK with PyTorch, see [the SageMaker Python SDK documentation](https://sagemaker.readthedocs.io/en/stable/using_pytorch.html).\n", "\n", "To start, we use the `PyTorch` estimator class to train our model. When creating our estimator, we make sure to specify a few things:\n", "\n", "* `entry_point`: the name of our PyTorch script. It contains our training script, which loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model. It also contains code to load and run the model during inference.\n", "* `source_dir`: the location of our training scripts and requirements.txt file. \"requirements.txt\" lists packages you want to use with your script.\n", "* `framework_version`: the PyTorch version we want to use\n", "\n", "The PyTorch estimator supports multi-machine, distributed PyTorch training. To use this, we just set train_instance_count to be greater than one. Our training script supports distributed training for only GPU instances. \n", "\n", "After creating the estimator, we then call fit(), which launches a training job. We use the Amazon S3 URIs where we uploaded the training data earlier." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2022-06-06 10:54:41 Starting - Starting the training job...\n", "2022-06-06 10:55:07 Starting - Preparing the instances for training............\n", "2022-06-06 10:57:08 Downloading - Downloading input data\n", "2022-06-06 10:57:08 Training - Downloading the training image......\n", "2022-06-06 10:57:54 Training - Training image download completed. Training in progress.\u001b[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device\u001b[0m\n", "\u001b[34mbash: no job control in this shell\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,555 sagemaker-containers INFO Imported framework sagemaker_pytorch_container.training\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,557 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,569 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,570 sagemaker_pytorch_container.training INFO Invoking user training script.\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,895 sagemaker-containers INFO Module default_user_module_name does not provide a setup.py. \u001b[0m\n", "\u001b[34mGenerating setup.py\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,895 sagemaker-containers INFO Generating setup.cfg\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,895 sagemaker-containers INFO Generating MANIFEST.in\u001b[0m\n", "\u001b[34m2022-06-06 10:57:56,896 sagemaker-containers INFO Installing module with the following command:\u001b[0m\n", "\u001b[34m/opt/conda/bin/python -m pip install . -r requirements.txt\u001b[0m\n", "\u001b[35mbash: cannot set terminal process group (-1): Inappropriate ioctl for device\u001b[0m\n", "\u001b[35mbash: no job control in this shell\u001b[0m\n", "\u001b[35m2022-06-06 10:57:56,840 sagemaker-containers INFO Imported framework sagemaker_pytorch_container.training\u001b[0m\n", "\u001b[35m2022-06-06 10:57:56,843 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[35m2022-06-06 10:57:56,855 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.\u001b[0m\n", "\u001b[35m2022-06-06 10:57:56,856 sagemaker_pytorch_container.training INFO Invoking user training script.\u001b[0m\n", "\u001b[35m2022-06-06 10:57:57,359 sagemaker-containers INFO Module default_user_module_name does not provide a setup.py. \u001b[0m\n", "\u001b[35mGenerating setup.py\u001b[0m\n", "\u001b[35m2022-06-06 10:57:57,359 sagemaker-containers INFO Generating setup.cfg\u001b[0m\n", "\u001b[35m2022-06-06 10:57:57,359 sagemaker-containers INFO Generating MANIFEST.in\u001b[0m\n", "\u001b[35m2022-06-06 10:57:57,359 sagemaker-containers INFO Installing module with the following command:\u001b[0m\n", "\u001b[35m/opt/conda/bin/python -m pip install . -r requirements.txt\u001b[0m\n", "\u001b[34mProcessing /tmp/tmp8c62n5js/module_dir\u001b[0m\n", "\u001b[34mRequirement already satisfied: tqdm in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (4.36.1)\u001b[0m\n", "\u001b[34mRequirement already satisfied: requests==2.22.0 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (2.22.0)\u001b[0m\n", "\u001b[35mProcessing /tmp/tmph7y6wpg5/module_dir\u001b[0m\n", "\u001b[35mRequirement already satisfied: tqdm in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 1)) (4.36.1)\u001b[0m\n", "\u001b[35mRequirement already satisfied: requests==2.22.0 in /opt/conda/lib/python3.6/site-packages (from -r requirements.txt (line 2)) (2.22.0)\u001b[0m\n", "\u001b[34mCollecting regex\n", " Downloading regex-2022.6.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (749 kB)\u001b[0m\n", "\u001b[34mCollecting sentencepiece\n", " Downloading sentencepiece-0.1.96-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\u001b[0m\n", "\u001b[34mCollecting sacremoses\n", " Downloading sacremoses-0.0.53.tar.gz (880 kB)\u001b[0m\n", "\u001b[35mCollecting regex\n", " Downloading regex-2022.6.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (749 kB)\u001b[0m\n", "\u001b[35mCollecting sentencepiece\n", " Downloading sentencepiece-0.1.96-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\u001b[0m\n", "\u001b[35mCollecting sacremoses\n", " Downloading sacremoses-0.0.53.tar.gz (880 kB)\u001b[0m\n", "\u001b[34mCollecting transformers==2.3.0\n", " Downloading transformers-2.3.0-py3-none-any.whl (447 kB)\u001b[0m\n", "\u001b[34mRequirement already satisfied: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (2.8)\u001b[0m\n", "\u001b[34mRequirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (3.0.4)\u001b[0m\n", "\u001b[34mRequirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (1.25.7)\u001b[0m\n", "\u001b[34mRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (2019.11.28)\u001b[0m\n", "\u001b[34mRequirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (1.12.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: click in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (7.0)\u001b[0m\n", "\u001b[34mRequirement already satisfied: joblib in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (0.14.1)\u001b[0m\n", "\u001b[34mRequirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from transformers==2.3.0->-r requirements.txt (line 6)) (1.16.4)\u001b[0m\n", "\u001b[34mRequirement already satisfied: boto3 in /opt/conda/lib/python3.6/site-packages (from transformers==2.3.0->-r requirements.txt (line 6)) (1.11.7)\u001b[0m\n", "\u001b[34mRequirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.9.4)\u001b[0m\n", "\u001b[34mRequirement already satisfied: botocore<1.15.0,>=1.14.7 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (1.14.7)\u001b[0m\n", "\u001b[34mRequirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.3.1)\u001b[0m\n", "\u001b[34mRequirement already satisfied: docutils<0.16,>=0.10 in /opt/conda/lib/python3.6/site-packages (from botocore<1.15.0,>=1.14.7->boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.15.2)\u001b[0m\n", "\u001b[34mRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.6/site-packages (from botocore<1.15.0,>=1.14.7->boto3->transformers==2.3.0->-r requirements.txt (line 6)) (2.8.1)\u001b[0m\n", "\u001b[34mBuilding wheels for collected packages: sacremoses, default-user-module-name\n", " Building wheel for sacremoses (setup.py): started\n", " Building wheel for sacremoses (setup.py): finished with status 'done'\n", " Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895251 sha256=c95fbde8365ae1d7ce14458e072d4a1515421dda1d340d52e576ab02dcb4eebf\n", " Stored in directory: /root/.cache/pip/wheels/4c/64/31/e9900a234b23fb3e9dc565d6114a9d6ff84a72dbdd356502b4\n", " Building wheel for default-user-module-name (setup.py): started\u001b[0m\n", "\u001b[35mCollecting transformers==2.3.0\n", " Downloading transformers-2.3.0-py3-none-any.whl (447 kB)\u001b[0m\n", "\u001b[35mRequirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (1.25.7)\u001b[0m\n", "\u001b[35mRequirement already satisfied: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (2.8)\u001b[0m\n", "\u001b[35mRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (2019.11.28)\u001b[0m\n", "\u001b[35mRequirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests==2.22.0->-r requirements.txt (line 2)) (3.0.4)\u001b[0m\n", "\u001b[35mRequirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (1.12.0)\u001b[0m\n", "\u001b[35mRequirement already satisfied: click in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (7.0)\u001b[0m\n", "\u001b[35mRequirement already satisfied: joblib in /opt/conda/lib/python3.6/site-packages (from sacremoses->-r requirements.txt (line 5)) (0.14.1)\u001b[0m\n", "\u001b[35mRequirement already satisfied: boto3 in /opt/conda/lib/python3.6/site-packages (from transformers==2.3.0->-r requirements.txt (line 6)) (1.11.7)\u001b[0m\n", "\u001b[35mRequirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from transformers==2.3.0->-r requirements.txt (line 6)) (1.16.4)\u001b[0m\n", "\u001b[35mRequirement already satisfied: jmespath<1.0.0,>=0.7.1 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.9.4)\u001b[0m\n", "\u001b[35mRequirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.3.1)\u001b[0m\n", "\u001b[35mRequirement already satisfied: botocore<1.15.0,>=1.14.7 in /opt/conda/lib/python3.6/site-packages (from boto3->transformers==2.3.0->-r requirements.txt (line 6)) (1.14.7)\u001b[0m\n", "\u001b[35mRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.6/site-packages (from botocore<1.15.0,>=1.14.7->boto3->transformers==2.3.0->-r requirements.txt (line 6)) (2.8.1)\u001b[0m\n", "\u001b[35mRequirement already satisfied: docutils<0.16,>=0.10 in /opt/conda/lib/python3.6/site-packages (from botocore<1.15.0,>=1.14.7->boto3->transformers==2.3.0->-r requirements.txt (line 6)) (0.15.2)\u001b[0m\n", "\u001b[35mBuilding wheels for collected packages: sacremoses, default-user-module-name\n", " Building wheel for sacremoses (setup.py): started\n", " Building wheel for sacremoses (setup.py): finished with status 'done'\n", " Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895251 sha256=fee9e66614caad63da49b6fd143eeffa666d69fa99089a524212bb087fc94bd7\n", " Stored in directory: /root/.cache/pip/wheels/4c/64/31/e9900a234b23fb3e9dc565d6114a9d6ff84a72dbdd356502b4\n", " Building wheel for default-user-module-name (setup.py): started\u001b[0m\n", "\u001b[34m Building wheel for default-user-module-name (setup.py): finished with status 'done'\n", " Created wheel for default-user-module-name: filename=default_user_module_name-1.0.0-py2.py3-none-any.whl size=15590 sha256=d7f30bf7fb621d0af1f1868d6bb5fe83fa2138ba4b1086e18b5f4e96a7a24a05\n", " Stored in directory: /tmp/pip-ephem-wheel-cache-ww1zvw5j/wheels/34/3d/1a/6789c912a32cf0107edfc1b298d18a99538755fbfcf9820987\u001b[0m\n", "\u001b[34mSuccessfully built sacremoses default-user-module-name\u001b[0m\n", "\u001b[34mInstalling collected packages: regex, sentencepiece, sacremoses, transformers, default-user-module-name\u001b[0m\n", "\u001b[35m Building wheel for default-user-module-name (setup.py): finished with status 'done'\n", " Created wheel for default-user-module-name: filename=default_user_module_name-1.0.0-py2.py3-none-any.whl size=15591 sha256=0ed7981b4e57619d4bd2d5bdfc257321d156ee6bd396d8c759628da91dc636dc\n", " Stored in directory: /tmp/pip-ephem-wheel-cache-87op9bei/wheels/ff/12/cc/c53a069e893e6d5a66a2ec988ca26a11564bce482fe1937fe0\u001b[0m\n", "\u001b[35mSuccessfully built sacremoses default-user-module-name\u001b[0m\n", "\u001b[35mInstalling collected packages: regex, sentencepiece, sacremoses, transformers, default-user-module-name\u001b[0m\n", "\u001b[35mSuccessfully installed default-user-module-name-1.0.0 regex-2022.6.2 sacremoses-0.0.53 sentencepiece-0.1.96 transformers-2.3.0\u001b[0m\n", "\u001b[34mSuccessfully installed default-user-module-name-1.0.0 regex-2022.6.2 sacremoses-0.0.53 sentencepiece-0.1.96 transformers-2.3.0\u001b[0m\n", "\u001b[34mWARNING: You are using pip version 20.0.1; however, version 21.3.1 is available.\u001b[0m\n", "\u001b[34mYou should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", "\u001b[34m2022-06-06 10:58:01,563 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m2022-06-06 10:58:01,578 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m2022-06-06 10:58:01,593 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[34m2022-06-06 10:58:01,605 sagemaker-containers INFO Invoking user script\u001b[0m\n", "\u001b[34mTraining Env:\u001b[0m\n", "\u001b[34m{\n", " \"additional_framework_parameters\": {},\n", " \"channel_input_dirs\": {\n", " \"testing\": \"/opt/ml/input/data/testing\",\n", " \"training\": \"/opt/ml/input/data/training\"\n", " },\n", " \"current_host\": \"algo-1\",\n", " \"framework_module\": \"sagemaker_pytorch_container.training:main\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"hyperparameters\": {\n", " \"backend\": \"gloo\",\n", " \"epochs\": 1,\n", " \"num_labels\": 2\n", " },\n", " \"input_config_dir\": \"/opt/ml/input/config\",\n", " \"input_data_config\": {\n", " \"testing\": {\n", " \"TrainingInputMode\": \"File\",\n", " \"S3DistributionType\": \"FullyReplicated\",\n", " \"RecordWrapperType\": \"None\"\n", " },\n", " \"training\": {\n", " \"TrainingInputMode\": \"File\",\n", " \"S3DistributionType\": \"FullyReplicated\",\n", " \"RecordWrapperType\": \"None\"\n", " }\n", " },\n", " \"input_dir\": \"/opt/ml/input\",\n", " \"is_master\": true,\n", " \"job_name\": \"pytorch-training-2022-06-06-10-54-41-179\",\n", " \"log_level\": 20,\n", " \"master_hostname\": \"algo-1\",\n", " \"model_dir\": \"/opt/ml/model\",\n", " \"module_dir\": \"s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\",\n", " \"module_name\": \"train_deploy\",\n", " \"network_interface_name\": \"eth0\",\n", " \"num_cpus\": 16,\n", " \"num_gpus\": 0,\n", " \"output_data_dir\": \"/opt/ml/output/data\",\n", " \"output_dir\": \"/opt/ml/output\",\n", " \"output_intermediate_dir\": \"/opt/ml/output/intermediate\",\n", " \"resource_config\": {\n", " \"current_host\": \"algo-1\",\n", " \"current_instance_type\": \"ml.c4.4xlarge\",\n", " \"current_group_name\": \"homogeneousCluster\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"instance_groups\": [\n", " {\n", " \"instance_group_name\": \"homogeneousCluster\",\n", " \"instance_type\": \"ml.c4.4xlarge\",\n", " \"hosts\": [\n", " \"algo-2\",\n", " \"algo-1\"\n", " ]\n", " }\n", " ],\n", " \"network_interface_name\": \"eth0\"\n", " },\n", " \"user_entry_point\": \"train_deploy.py\"\u001b[0m\n", "\u001b[34m}\u001b[0m\n", "\u001b[34mEnvironment variables:\u001b[0m\n", "\u001b[34mSM_HOSTS=[\"algo-1\",\"algo-2\"]\u001b[0m\n", "\u001b[34mSM_NETWORK_INTERFACE_NAME=eth0\u001b[0m\n", "\u001b[34mSM_HPS={\"backend\":\"gloo\",\"epochs\":1,\"num_labels\":2}\u001b[0m\n", "\u001b[34mSM_USER_ENTRY_POINT=train_deploy.py\u001b[0m\n", "\u001b[34mSM_FRAMEWORK_PARAMS={}\u001b[0m\n", "\u001b[34mSM_RESOURCE_CONFIG={\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.c4.4xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-2\",\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.c4.4xlarge\"}],\"network_interface_name\":\"eth0\"}\u001b[0m\n", "\u001b[34mSM_INPUT_DATA_CONFIG={\"testing\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}}\u001b[0m\n", "\u001b[34mSM_OUTPUT_DATA_DIR=/opt/ml/output/data\u001b[0m\n", "\u001b[34mSM_CHANNELS=[\"testing\",\"training\"]\u001b[0m\n", "\u001b[34mSM_CURRENT_HOST=algo-1\u001b[0m\n", "\u001b[34mSM_MODULE_NAME=train_deploy\u001b[0m\n", "\u001b[34mSM_LOG_LEVEL=20\u001b[0m\n", "\u001b[34mSM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main\u001b[0m\n", "\u001b[34mSM_INPUT_DIR=/opt/ml/input\u001b[0m\n", "\u001b[34mSM_INPUT_CONFIG_DIR=/opt/ml/input/config\u001b[0m\n", "\u001b[34mSM_OUTPUT_DIR=/opt/ml/output\u001b[0m\n", "\u001b[34mSM_NUM_CPUS=16\u001b[0m\n", "\u001b[34mSM_NUM_GPUS=0\u001b[0m\n", "\u001b[34mSM_MODEL_DIR=/opt/ml/model\u001b[0m\n", "\u001b[34mSM_MODULE_DIR=s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\u001b[0m\n", "\u001b[34mSM_TRAINING_ENV={\"additional_framework_parameters\":{},\"channel_input_dirs\":{\"testing\":\"/opt/ml/input/data/testing\",\"training\":\"/opt/ml/input/data/training\"},\"current_host\":\"algo-1\",\"framework_module\":\"sagemaker_pytorch_container.training:main\",\"hosts\":[\"algo-1\",\"algo-2\"],\"hyperparameters\":{\"backend\":\"gloo\",\"epochs\":1,\"num_labels\":2},\"input_config_dir\":\"/opt/ml/input/config\",\"input_data_config\":{\"testing\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}},\"input_dir\":\"/opt/ml/input\",\"is_master\":true,\"job_name\":\"pytorch-training-2022-06-06-10-54-41-179\",\"log_level\":20,\"master_hostname\":\"algo-1\",\"model_dir\":\"/opt/ml/model\",\"module_dir\":\"s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\",\"module_name\":\"train_deploy\",\"network_interface_name\":\"eth0\",\"num_cpus\":16,\"num_gpus\":0,\"output_data_dir\":\"/opt/ml/output/data\",\"output_dir\":\"/opt/ml/output\",\"output_intermediate_dir\":\"/opt/ml/output/intermediate\",\"resource_config\":{\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-1\",\"current_instance_type\":\"ml.c4.4xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-2\",\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.c4.4xlarge\"}],\"network_interface_name\":\"eth0\"},\"user_entry_point\":\"train_deploy.py\"}\u001b[0m\n", "\u001b[34mSM_USER_ARGS=[\"--backend\",\"gloo\",\"--epochs\",\"1\",\"--num_labels\",\"2\"]\u001b[0m\n", "\u001b[34mSM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate\u001b[0m\n", "\u001b[34mSM_CHANNEL_TESTING=/opt/ml/input/data/testing\u001b[0m\n", "\u001b[34mSM_CHANNEL_TRAINING=/opt/ml/input/data/training\u001b[0m\n", "\u001b[34mSM_HP_BACKEND=gloo\u001b[0m\n", "\u001b[34mSM_HP_EPOCHS=1\u001b[0m\n", "\u001b[34mSM_HP_NUM_LABELS=2\u001b[0m\n", "\u001b[34mPYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python36.zip:/opt/conda/lib/python3.6:/opt/conda/lib/python3.6/lib-dynload:/opt/conda/lib/python3.6/site-packages\u001b[0m\n", "\u001b[34mInvoking script with the following command:\u001b[0m\n", "\u001b[34m/opt/conda/bin/python train_deploy.py --backend gloo --epochs 1 --num_labels 2\u001b[0m\n", "\u001b[35mWARNING: You are using pip version 20.0.1; however, version 21.3.1 is available.\u001b[0m\n", "\u001b[35mYou should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", "\u001b[35m2022-06-06 10:58:02,176 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[35m2022-06-06 10:58:02,195 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[35m2022-06-06 10:58:02,213 sagemaker-containers INFO No GPUs detected (normal if no gpus installed)\u001b[0m\n", "\u001b[35m2022-06-06 10:58:02,227 sagemaker-containers INFO Invoking user script\u001b[0m\n", "\u001b[35mTraining Env:\u001b[0m\n", "\u001b[35m{\n", " \"additional_framework_parameters\": {},\n", " \"channel_input_dirs\": {\n", " \"testing\": \"/opt/ml/input/data/testing\",\n", " \"training\": \"/opt/ml/input/data/training\"\n", " },\n", " \"current_host\": \"algo-2\",\n", " \"framework_module\": \"sagemaker_pytorch_container.training:main\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"hyperparameters\": {\n", " \"backend\": \"gloo\",\n", " \"epochs\": 1,\n", " \"num_labels\": 2\n", " },\n", " \"input_config_dir\": \"/opt/ml/input/config\",\n", " \"input_data_config\": {\n", " \"testing\": {\n", " \"TrainingInputMode\": \"File\",\n", " \"S3DistributionType\": \"FullyReplicated\",\n", " \"RecordWrapperType\": \"None\"\n", " },\n", " \"training\": {\n", " \"TrainingInputMode\": \"File\",\n", " \"S3DistributionType\": \"FullyReplicated\",\n", " \"RecordWrapperType\": \"None\"\n", " }\n", " },\n", " \"input_dir\": \"/opt/ml/input\",\n", " \"is_master\": false,\n", " \"job_name\": \"pytorch-training-2022-06-06-10-54-41-179\",\n", " \"log_level\": 20,\n", " \"master_hostname\": \"algo-1\",\n", " \"model_dir\": \"/opt/ml/model\",\n", " \"module_dir\": \"s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\",\n", " \"module_name\": \"train_deploy\",\n", " \"network_interface_name\": \"eth0\",\n", " \"num_cpus\": 16,\n", " \"num_gpus\": 0,\n", " \"output_data_dir\": \"/opt/ml/output/data\",\n", " \"output_dir\": \"/opt/ml/output\",\n", " \"output_intermediate_dir\": \"/opt/ml/output/intermediate\",\n", " \"resource_config\": {\n", " \"current_host\": \"algo-2\",\n", " \"current_instance_type\": \"ml.c4.4xlarge\",\n", " \"current_group_name\": \"homogeneousCluster\",\n", " \"hosts\": [\n", " \"algo-1\",\n", " \"algo-2\"\n", " ],\n", " \"instance_groups\": [\n", " {\n", " \"instance_group_name\": \"homogeneousCluster\",\n", " \"instance_type\": \"ml.c4.4xlarge\",\n", " \"hosts\": [\n", " \"algo-2\",\n", " \"algo-1\"\n", " ]\n", " }\n", " ],\n", " \"network_interface_name\": \"eth0\"\n", " },\n", " \"user_entry_point\": \"train_deploy.py\"\u001b[0m\n", "\u001b[35m}\u001b[0m\n", "\u001b[35mEnvironment variables:\u001b[0m\n", "\u001b[35mSM_HOSTS=[\"algo-1\",\"algo-2\"]\u001b[0m\n", "\u001b[35mSM_NETWORK_INTERFACE_NAME=eth0\u001b[0m\n", "\u001b[35mSM_HPS={\"backend\":\"gloo\",\"epochs\":1,\"num_labels\":2}\u001b[0m\n", "\u001b[35mSM_USER_ENTRY_POINT=train_deploy.py\u001b[0m\n", "\u001b[35mSM_FRAMEWORK_PARAMS={}\u001b[0m\n", "\u001b[35mSM_RESOURCE_CONFIG={\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-2\",\"current_instance_type\":\"ml.c4.4xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-2\",\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.c4.4xlarge\"}],\"network_interface_name\":\"eth0\"}\u001b[0m\n", "\u001b[35mSM_INPUT_DATA_CONFIG={\"testing\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}}\u001b[0m\n", "\u001b[35mSM_OUTPUT_DATA_DIR=/opt/ml/output/data\u001b[0m\n", "\u001b[35mSM_CHANNELS=[\"testing\",\"training\"]\u001b[0m\n", "\u001b[35mSM_CURRENT_HOST=algo-2\u001b[0m\n", "\u001b[35mSM_MODULE_NAME=train_deploy\u001b[0m\n", "\u001b[35mSM_LOG_LEVEL=20\u001b[0m\n", "\u001b[35mSM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main\u001b[0m\n", "\u001b[35mSM_INPUT_DIR=/opt/ml/input\u001b[0m\n", "\u001b[35mSM_INPUT_CONFIG_DIR=/opt/ml/input/config\u001b[0m\n", "\u001b[35mSM_OUTPUT_DIR=/opt/ml/output\u001b[0m\n", "\u001b[35mSM_NUM_CPUS=16\u001b[0m\n", "\u001b[35mSM_NUM_GPUS=0\u001b[0m\n", "\u001b[35mSM_MODEL_DIR=/opt/ml/model\u001b[0m\n", "\u001b[35mSM_MODULE_DIR=s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\u001b[0m\n", "\u001b[35mSM_TRAINING_ENV={\"additional_framework_parameters\":{},\"channel_input_dirs\":{\"testing\":\"/opt/ml/input/data/testing\",\"training\":\"/opt/ml/input/data/training\"},\"current_host\":\"algo-2\",\"framework_module\":\"sagemaker_pytorch_container.training:main\",\"hosts\":[\"algo-1\",\"algo-2\"],\"hyperparameters\":{\"backend\":\"gloo\",\"epochs\":1,\"num_labels\":2},\"input_config_dir\":\"/opt/ml/input/config\",\"input_data_config\":{\"testing\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"},\"training\":{\"RecordWrapperType\":\"None\",\"S3DistributionType\":\"FullyReplicated\",\"TrainingInputMode\":\"File\"}},\"input_dir\":\"/opt/ml/input\",\"is_master\":false,\"job_name\":\"pytorch-training-2022-06-06-10-54-41-179\",\"log_level\":20,\"master_hostname\":\"algo-1\",\"model_dir\":\"/opt/ml/model\",\"module_dir\":\"s3://sagemaker-ap-southeast-2-431579215499/pytorch-training-2022-06-06-10-54-41-179/source/sourcedir.tar.gz\",\"module_name\":\"train_deploy\",\"network_interface_name\":\"eth0\",\"num_cpus\":16,\"num_gpus\":0,\"output_data_dir\":\"/opt/ml/output/data\",\"output_dir\":\"/opt/ml/output\",\"output_intermediate_dir\":\"/opt/ml/output/intermediate\",\"resource_config\":{\"current_group_name\":\"homogeneousCluster\",\"current_host\":\"algo-2\",\"current_instance_type\":\"ml.c4.4xlarge\",\"hosts\":[\"algo-1\",\"algo-2\"],\"instance_groups\":[{\"hosts\":[\"algo-2\",\"algo-1\"],\"instance_group_name\":\"homogeneousCluster\",\"instance_type\":\"ml.c4.4xlarge\"}],\"network_interface_name\":\"eth0\"},\"user_entry_point\":\"train_deploy.py\"}\u001b[0m\n", "\u001b[35mSM_USER_ARGS=[\"--backend\",\"gloo\",\"--epochs\",\"1\",\"--num_labels\",\"2\"]\u001b[0m\n", "\u001b[35mSM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate\u001b[0m\n", "\u001b[35mSM_CHANNEL_TESTING=/opt/ml/input/data/testing\u001b[0m\n", "\u001b[35mSM_CHANNEL_TRAINING=/opt/ml/input/data/training\u001b[0m\n", "\u001b[35mSM_HP_BACKEND=gloo\u001b[0m\n", "\u001b[35mSM_HP_EPOCHS=1\u001b[0m\n", "\u001b[35mSM_HP_NUM_LABELS=2\u001b[0m\n", "\u001b[35mPYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python36.zip:/opt/conda/lib/python3.6:/opt/conda/lib/python3.6/lib-dynload:/opt/conda/lib/python3.6/site-packages\u001b[0m\n", "\u001b[35mInvoking script with the following command:\u001b[0m\n", "\u001b[35m/opt/conda/bin/python train_deploy.py --backend gloo --epochs 1 --num_labels 2\u001b[0m\n", "\u001b[34mLoading BERT tokenizer...\u001b[0m\n", "\u001b[35mLoading BERT tokenizer...\u001b[0m\n", "\u001b[35mINFO:__main__:Train Epoch: 1 [0/3207 (0%)] Loss: 0.661625\u001b[0m\n", "\u001b[35mDistributed training - True\u001b[0m\n", "\u001b[34mINFO:__main__:Train Epoch: 1 [0/3207 (0%)] Loss: 0.646870\u001b[0m\n", "\u001b[34mDistributed training - True\u001b[0m\n", "\u001b[35mINFO:__main__:Train Epoch: 1 [350/3207 (98%)] Loss: 0.415226\u001b[0m\n", "\u001b[35mNumber of gpus available - 0\u001b[0m\n", "\u001b[35mINFO:__main__:Average training loss: 0.558502\u001b[0m\n", "\u001b[35mInitialized the distributed environment: 'gloo' backend on 2 nodes. Current host rank is 1. Number of gpus: 0\u001b[0m\n", "\u001b[35mGet train data loader\u001b[0m\n", "\u001b[34mINFO:__main__:Train Epoch: 1 [350/3207 (98%)] Loss: 0.387279\u001b[0m\n", "\u001b[34mNumber of gpus available - 0\u001b[0m\n", "\u001b[34mINFO:__main__:Average training loss: 0.551094\u001b[0m\n", "\u001b[34mInitialized the distributed environment: 'gloo' backend on 2 nodes. Current host rank is 0. Number of gpus: 0\u001b[0m\n", "\u001b[34mGet train data loader\u001b[0m\n", "\u001b[34mINFO:__main__:Test set: Accuracy: 0.768116\u001b[0m\n", "\u001b[34mProcesses 3207/6413 (50%) of train data\u001b[0m\n", "\u001b[34mProcesses 2138/2138 (100%) of test data\u001b[0m\n", "\u001b[34mINFO:__main__:Saving tuned model.\u001b[0m\n", "\u001b[34mStarting BertForSequenceClassification\u001b[0m\n", "\u001b[34mINFO:transformers.configuration_utils:Configuration saved in /opt/ml/model/config.json\u001b[0m\n", "\u001b[35mINFO:__main__:Test set: Accuracy: 0.797101\u001b[0m\n", "\u001b[35mProcesses 3207/6413 (50%) of train data\u001b[0m\n", "\u001b[35mProcesses 2138/2138 (100%) of test data\u001b[0m\n", "\u001b[35mINFO:__main__:Saving tuned model.\u001b[0m\n", "\u001b[35mStarting BertForSequenceClassification\u001b[0m\n", "\u001b[35mINFO:transformers.configuration_utils:Configuration saved in /opt/ml/model/config.json\u001b[0m\n", "\u001b[34mINFO:transformers.modeling_utils:Model weights saved in /opt/ml/model/pytorch_model.bin\u001b[0m\n", "\u001b[34mEnd of defining BertForSequenceClassification\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.220 algo-1:62 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.221 algo-1:62 INFO hook.py:152] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.221 algo-1:62 INFO hook.py:197] Saving to /opt/ml/output/tensors\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.224 algo-1:62 INFO hook.py:326] Monitoring the collections: losses\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.410 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.410 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.411 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.445 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.622 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.622 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.622 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.694 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.694 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.694 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.728 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.899 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.899 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.899 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.971 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.971 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:47.971 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.005 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.177 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.177 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.177 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.248 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[34m2022-06-06 11:06:25,181 sagemaker-containers INFO Reporting training SUCCESS\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.248 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.248 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.282 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.454 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.454 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.454 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.524 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.524 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.524 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.556 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.728 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.728 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.728 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.799 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.799 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.799 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.832 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.953 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.953 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:48.953 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.008 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.008 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.008 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.037 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.165 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.165 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.165 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.220 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.221 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.221 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.248 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.368 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.368 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.368 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.422 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.422 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.422 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.450 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.571 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.572 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.572 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.625 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.626 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.626 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.654 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.779 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.779 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.779 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.835 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.835 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.836 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.865 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.988 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.989 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:49.989 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.060 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.060 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.060 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.095 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.266 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.267 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[34m[2022-06-06 10:58:50.267 algo-1:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[34mTrain Epoch: 1 [0/3207 (0%)] Loss: 0.646870\u001b[0m\n", "\u001b[34mTrain Epoch: 1 [350/3207 (98%)] Loss: 0.387279\u001b[0m\n", "\u001b[34mAverage training loss: 0.551094\u001b[0m\n", "\u001b[34mTest set: Accuracy: 0.768116\u001b[0m\n", "\u001b[34mSaving tuned model.\u001b[0m\n", "\u001b[34m[2022-06-06 11:06:24.832 algo-1:62 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.\u001b[0m\n", "\u001b[35mINFO:transformers.modeling_utils:Model weights saved in /opt/ml/model/pytorch_model.bin\u001b[0m\n", "\u001b[35mEnd of defining BertForSequenceClassification\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.021 algo-2:62 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.021 algo-2:62 INFO hook.py:152] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.022 algo-2:62 INFO hook.py:197] Saving to /opt/ml/output/tensors\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.025 algo-2:62 INFO hook.py:326] Monitoring the collections: losses\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.237 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.237 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.237 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.270 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.405 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.405 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.405 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.0 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.458 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.458 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.459 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.487 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.609 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.609 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.609 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.1 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.661 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.661 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.661 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.692 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.817 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[35m2022-06-06 11:06:25,799 sagemaker-containers INFO Reporting training SUCCESS\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.817 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.818 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.2 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.871 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.871 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.871 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:47.899 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.023 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.024 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.024 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.3 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.074 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.074 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.074 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.101 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.229 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.230 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.230 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.4 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.282 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.282 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.283 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.312 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.439 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.439 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.439 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.5 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.492 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.492 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.492 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.521 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.644 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.644 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.644 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.6 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.697 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.697 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.697 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.726 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.854 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.854 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.854 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.7 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.907 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.907 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.907 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:48.937 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.066 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.066 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.066 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.8 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.120 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.120 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.120 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.147 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.272 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.273 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.273 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.9 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.325 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.325 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.325 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.353 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.480 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.480 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.480 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.10 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.533 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.533 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.533 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention.self NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.562 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11.attention NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.686 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.687 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[35m[2022-06-06 10:58:49.687 algo-2:62 WARNING hook.py:808] var is not Tensor or list or tuple of Tensors, module_name:module.bert.encoder.layer.11 NoneType\u001b[0m\n", "\u001b[35mTrain Epoch: 1 [0/3207 (0%)] Loss: 0.661625\u001b[0m\n", "\u001b[35mTrain Epoch: 1 [350/3207 (98%)] Loss: 0.415226\u001b[0m\n", "\u001b[35mAverage training loss: 0.558502\u001b[0m\n", "\u001b[35mTest set: Accuracy: 0.797101\u001b[0m\n", "\u001b[35mSaving tuned model.\u001b[0m\n", "\u001b[35m[2022-06-06 11:06:25.488 algo-2:62 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.\u001b[0m\n", "\n", "2022-06-06 11:06:31 Uploading - Uploading generated training model\n", "2022-06-06 11:07:57 Completed - Training job completed\n", "Training seconds: 1330\n", "Billable seconds: 1330\n" ] } ], "source": [ "from sagemaker.pytorch import PyTorch\n", "\n", "# place to save model artifact\n", "output_path = f\"s3://{bucket}/{prefix}\"\n", "\n", "estimator = PyTorch(\n", " entry_point=\"train_deploy.py\",\n", " source_dir=\"code\",\n", " role=role,\n", " framework_version=\"1.3.1\",\n", " py_version=\"py3\",\n", " instance_count=2, # this script only support distributed training for GPU instances.\n", " instance_type=\"ml.c4.4xlarge\",\n", " output_path=output_path,\n", " hyperparameters={\n", " \"epochs\": 1,\n", " \"num_labels\": 2,\n", " \"backend\": \"gloo\",\n", " },\n", " disable_profiler=True, # disable debugger\n", ")\n", "estimator.fit({\"training\": inputs_train, \"testing\": inputs_test})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Host" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After training our model, we host it on an Amazon SageMaker Endpoint. To make the endpoint load the model and serve predictions, we implement a few methods in `train_deploy.py`.\n", "\n", "* `model_fn()`: function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn.\n", "* `input_fn()`: deserializes and prepares the prediction input. In this example, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in `input_fn()`, we first deserialize the JSON-formatted request body and return the input as a `torch.tensor`, as required for BERT.\n", "* `predict_fn()`: performs the prediction and returns the result.\n", "\n", "To deploy our endpoint, we call `deploy()` on our PyTorch estimator object, passing in our desired number of instances and instance type:\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "------!" ] } ], "source": [ "predictor = estimator.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then configure the predictor to use `application/json` for the content type when sending requests to our endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.serializer = sagemaker.serializers.JSONSerializer()\n", "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we use the returned predictor object to call the endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = predictor.predict(\"Somebody just left - guess who.\")\n", "print(\"predicted class: \", np.argmax(result, axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see the predicted class is 1 as expected because test sentence is a grammatically correct sentence. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before moving on, let's delete the Amazon SageMaker endpoint to avoid charges:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use a pretrained model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to reuse pretrained model, you can create a `PyTorchModel` from existing model artifacts. For example,\n", "we can retrieve model artifacts we just trained. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_data = estimator.model_data\n", "print(model_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from sagemaker.pytorch.model import PyTorchModel \n", "\n", "pytorch_model = PyTorchModel(model_data=model_data,\n", " role=role,\n", " framework_version=\"1.3.1\",\n", " source_dir=\"code\",\n", " py_version=\"py3\",\n", " entry_point=\"train_deploy.py\")\n", "\n", "predictor = pytorch_model.deploy(initial_instance_count=1, instance_type=\"ml.m4.xlarge\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.serializer = sagemaker.serializers.JSONSerializer()\n", "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = predictor.predict(\"Remember to delete me when are done\")\n", "print(\"predicted class: \", np.argmax(result, axis=1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# batch inference \n", "result = predictor.predict([\n", " \"This is how you do batch inference\", \n", " \"Put several sentences in a list\",\n", " \"Make sure they are shorter than 64 words\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Predicted class: \", np.argmax(result, axis=1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Elastic Inference\n", "\n", "Selecting the right instance type for inference requires deciding between different amounts of GPU, CPU, and memory resources, and optimizing for one of these resources on a standalone GPU instance usually leads to under-utilization of other resources. [Amazon Elastic Inference](https://aws.amazon.com/machine-learning/elastic-inference/) solves this problem by enabling us to attach the right amount of GPU-powered inference acceleration to our endpoint. In March 2020, [Elastic Inference support for PyTorch became available](https://aws.amazon.com/blogs/machine-learning/reduce-ml-inference-costs-on-amazon-sagemaker-for-pytorch-models-using-amazon-elastic-inference/) for both Amazon SageMaker and Amazon EC2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use Elastic Inference, we must convert our trained model to TorchScript. The location of the model artifacts is `estimator.model_data`. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we create a folder to save model trained model, and download the `model.tar.gz` file to local directory. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sh -s $estimator.model_data\n", "mkdir model\n", "aws s3 cp $1 model/ \n", "tar xvzf model/model.tar.gz --directory ./model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code converts our model into the TorchScript format:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "import torch\n", "from transformers import BertForSequenceClassification\n", "\n", "model_torchScript = BertForSequenceClassification.from_pretrained(\"model/\", torchscript=True)\n", "device = \"cpu\"\n", "# max length for the sentences: 64\n", "max_len = 64\n", "\n", "for_jit_trace_input_ids = [0] * max_len\n", "for_jit_trace_attention_masks = [0] * max_len\n", "for_jit_trace_input = torch.tensor([for_jit_trace_input_ids])\n", "for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids])\n", "\n", "traced_model = torch.jit.trace(\n", " model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)]\n", ")\n", "torch.jit.save(traced_model, \"traced_bert.pt\")\n", "\n", "subprocess.call([\"tar\", \"-czvf\", \"traced_bert.tar.gz\", \"traced_bert.pt\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading the TorchScript model and using it for prediction require small changes in our model loading and prediction functions. We create a new script `deploy_ei.py` that is slightly different from `train_deploy.py` script." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize code/deploy_ei.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we upload TorchScript model to S3 and deploy using Elastic Inference. The accelerator_type=`ml.eia2.xlarge` parameter is how we attach the Elastic Inference accelerator to our endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch import PyTorchModel\n", "\n", "instance_type = 'ml.m5.large'\n", "accelerator_type = 'ml.eia2.xlarge'\n", "\n", "# TorchScript model\n", "tar_filename = 'traced_bert.tar.gz'\n", "\n", "# Returns S3 bucket URL\n", "print('Upload tarball to S3')\n", "model_data = sagemaker_session.upload_data(path=tar_filename, bucket=bucket, key_prefix=prefix)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "endpoint_name = 'bert-ei-traced-{}-{}-{}'.format(instance_type, \n", " accelerator_type, time.time()).replace('.', '').replace('_', '')\n", "\n", "pytorch = PyTorchModel(\n", " model_data=model_data,\n", " role=role,\n", " entry_point='deploy_ei.py',\n", " source_dir='code',\n", " framework_version='1.3.1',\n", " py_version='py3',\n", " sagemaker_session=sagemaker_session\n", ")\n", "\n", "# Function will exit before endpoint is finished creating\n", "predictor = pytorch.deploy(\n", " initial_instance_count=1,\n", " instance_type=instance_type,\n", " accelerator_type=accelerator_type,\n", " endpoint_name=endpoint_name,\n", " wait=True,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.serializer = sagemaker.serializers.JSONSerializer()\n", "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = predictor.predict('Please remember to delete me when you are done.')\n", "print(\"Predicted class:\", np.argmax(res, axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cleanup\n", "\n", "Lastly, please remember to delete the Amazon SageMaker endpoint to avoid charges:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint()" ] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 4 }