# Lab 3: Deploy Hugging Face Transformers in SageMaker Real-time Endpoint
---

## Introduction
---

본 모듈에서는 Hugging Face 모델을 리얼타임 엔드포인트로 배포합니다. SageMakers는 사전 빌드된 Hugging Face 추론 컨테이너와 Hugging Face Inference Toolkit을 제공하고 있기 때문에, 기존 SageMaker 엔드포인트 배포와 동일한 방법으로 진행할 수 있습니다. 또한, Hugging Face 전용 기능으로 Hugging Face Hub(https://huggingface.co/models) 에 등록된 모델을 직접 임포트해서 엔드포인트 배포가 가능합니다. 아래의 예제 코드를 참조해 주세요.

```python
hub = {
    'HF_MODEL_ID': model_id, 
    'HF_TASK':'text-classification' 
}

hf_hub_model = HuggingFaceModel(
    env=hub,
    ...
)
```

SageMaker Hugging Face Inference Toolkit은 ML 모델을 제공하기 위해 [멀티 모델 서버(MMS; Multi Model Server)](https://github.com/awslabs/multi-model-server)를 사용합니다. SageMaker와 호환되도록 하는 구성 및 설정으로 MMS를 부트스트랩하고 시나리오의 요구 사항에 따라 모델 당 작업자 수(number of workers per model)와 같은 중요한 성능 매개변수를 조정할 수 있습니다.

보다 다양한 유즈케이스에 대한 예제 코드가 필요하고 핸즈온 및 추론에 필요한 스크립트를 커스터마이징하고 싶다면(BYOS; Bring Your Own Scripts) 아래 URL을 참조하세요.

- SageMaker Hugging Face Inference Toolkit: https://github.com/aws/sagemaker-huggingface-inference-toolkit
- Amazon SageMaker Deep Learning Inference Hands-on-Lab: https://github.com/aws-samples/sagemaker-inference-samples-kr


엔드포인트 생성은 다음의 세 단계로 구성됩니다.
1. **모델(Model) 생성** — SageMaker 배포에 필요한 모델을 생성합니다. 추론 컨테이너 이미지와 모델 아티팩트의 S3 경로를 설정합니다.
2. **엔드포인트 설정(Endpoint Configuration) 생성** — 프로덕션 변형(production variants)에서 하나 이상의 모델 이름과 SageMaker가 각 프로덕션 변형을 호스팅하기 위해 프로비저닝할 추론 호스팅 인스턴스 타입을 지정합니다.
3. **엔드포인트(Endpoint) 생성** — 엔드포인트 설정을 기반으로 엔드포인트를 생성합니다. 호스팅 인스턴스를 프로비저닝하고 모델을 배포합니다.


In [1]:
%load_ext autoreload
%autoreload 2
%store -r
%store

Stored variables and their in-db values:
local_model_dir             -> './model'
model_id                    -> 'bert-base-multilingual-cased'
s3_model_path               -> 's3://sagemaker-us-east-1-143656149352/kornlp-ner-
tokenizer_id                -> 'bert-base-multilingual-cased'


In [2]:
try:
    model_id 
    tokenizer_id
    s3_model_path
    local_model_dir
    print("[OK] You can proceed.")
except NameError:
    print("+"*60)
    print("[ERROR] Please run the previous hands-on lab before you continue.")
    print("+"*60)

[OK] You can proceed.


In [3]:
import os
import json
import sys
import logging
import boto3
import sagemaker
import pandas as pd
from sagemaker.huggingface import HuggingFaceModel
from sagemaker import session
from transformers import ElectraConfig
from transformers import (
    ElectraModel, ElectraTokenizer, ElectraForSequenceClassification
)

logging.basicConfig(
    level=logging.INFO, 
    format='[{%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)


sess = sagemaker.Session()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

<br>

## 1. Model Serving Preparation
---

### Create Model Serving Script

아래 코드 셀은 src 디렉토리에 SageMaker 추론 스크립트를 저장합니다.

#### Option 1.
- `model_fn(model_dir)`: S3의 `model_dir`에 저장된 모델 아티팩트를 로드합니다.
- `input_fn(request_body, content_type)`: 입력 데이터를 전처리합니다. `content_type`은 입력 데이터 종류에 따라 다양하게 처리 가능합니다. (예: `application/x-npy`, `application/json`, `application/csv`등)
- `predict_fn(input_object, model)`: `input_fn(...)`을 통해 들어온 데이터에 대해 추론을 수행합니다.
- `output_fn(prediction, accept_type)`: `predict_fn(...)`에서 받은 추론 결과를 후처리를 거쳐 프론트엔드로 전송합니다.

#### Option 2.
- `model_fn(model_dir)`: S3의 model_dir에 저장된 모델 아티팩트를 로드합니다.
- `transform_fn(model, request_body, content_type, accept_type)`: `input_fn(...), predict_fn(...), output_fn(...)`을 `transform_fn(...)`으로 통합할 수 있습니다.

In [4]:
%%writefile scripts/inference.py
import os
import sys
import json
import torch
import logging
import numpy as np
from transformers import BertTokenizerFast, BertConfig, BertForTokenClassification, pipeline
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

logging.basicConfig(
    level=logging.INFO, 
    format='[{%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(sys.stdout)
    ]
)
logger = logging.getLogger(__name__)


def model_fn(model_dir):
    tokenizer = BertTokenizerFast.from_pretrained(f'{model_dir}')

    with open(os.path.join(model_dir, 'tag2id.json'), 'r') as f:
        tag2id = json.loads(f.read())

    with open(os.path.join(model_dir, 'id2tag.json'), 'r') as f:
        id2tag = json.loads(f.read())    

    with open(os.path.join(model_dir, 'tag2entity.json'), 'r') as f:
        tag2entity = json.loads(f.read())

    model_file = 'pytorch_model.bin'
    model_id = 'bert-base-multilingual-cased'
    model = BertForTokenClassification.from_pretrained(model_id, num_labels=len(id2tag))
    
    tag2id = {k:int(v) for k,v in tag2id.items()}     
    id2tag = {int(k):v for k,v in id2tag.items()}  
    
    model.config.id2label = id2tag
    model.config.label2id = tag2id
    model.load_state_dict(torch.load(f'{model_dir}/{model_file}', map_location=torch.device(device)))
    model = model.eval()
    return (model, tokenizer)


def input_fn(input_data, content_type="application/jsonlines"): 
    
    data_str = input_data.decode("utf-8")
    jsonlines = data_str.split("\n")
    inputs = []

    for jsonline in jsonlines:
        text = json.loads(jsonline)["text"][0]
        logger.info("input text: {}".format(text)) 
        inputs.append(text)
        
    return inputs


def predict_fn(inputs, model_tuple): 
    model, tokenizer = model_tuple
    device_id = -1 if device.type == "cpu" else 0
    outputs = []
    
    for example in inputs:
        nlp = pipeline("ner", model=model.to(device), device=device_id, 
                       tokenizer=tokenizer, aggregation_strategy='average')
        output = nlp(example)
        logger.info("predicted_results: {}".format(output))
        print("predicted_results: {}".format(output))
        
        prediction_dict = {}
        prediction_dict["output"] = output        

        outputs.append(output)
        
    output = outputs[0]
    jsonlines = []

    for entity in output:
        for k, v in entity.items():
            if type(v) == np.float32:
                entity[k] = v.item()

        jsonline = json.dumps(entity)
        jsonlines.append(jsonline)

    jsonlines_output = '\n'.join(jsonlines)

    return jsonlines_output


def output_fn(outputs, accept="application/jsonlines"):
    return outputs, accept

Overwriting scripts/inference.py


### Check Inference Results & Debugging
로컬 엔드포인트나 호스팅 엔드포인트 배포 전, 로컬 환경 상에서 직접 추론을 수행하여 결과를 확인합니다.

In [5]:
from scripts.inference import model_fn, input_fn, predict_fn
model_tuple = model_fn('./model')
model_sample_path = 'payload_samples.txt'

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertForTokenClassification: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at 

In [6]:
with open(model_sample_path, 'w') as file:
    file.write('{"text": ["아마존 SageMaker는 머신 러닝 통합 엔드투엔드 관리형 서비스로 2017년 런칭되었다."]}')
    
with open(model_sample_path, mode='rb') as file:
    request_body = file.read() 
    
inputs = input_fn(request_body)
outputs = predict_fn(inputs, model_tuple)    

[{inference.py:55} INFO - input text: 아마존 SageMaker는 머신 러닝 통합 엔드투엔드 관리형 서비스로 2017년 런칭되었다.
[{inference.py:70} INFO - predicted_results: [{'entity_group': 'ORG_B', 'score': 0.93706465, 'word': '아마존', 'start': 0, 'end': 3}, {'entity_group': 'TRM_I', 'score': 0.52719957, 'word': 'SageMaker는', 'start': 4, 'end': 14}, {'entity_group': 'TRM_B', 'score': 0.92613167, 'word': '머신', 'start': 15, 'end': 17}, {'entity_group': 'TRM_I', 'score': 0.5666977, 'word': '러닝', 'start': 18, 'end': 20}, {'entity_group': '', 'score': 0.67403936, 'word': '통합', 'start': 21, 'end': 23}, {'entity_group': 'TRM_I', 'score': 0.45783806, 'word': '엔드투엔드', 'start': 24, 'end': 29}, {'entity_group': '', 'score': 0.44695696, 'word': '관리형', 'start': 30, 'end': 33}, {'entity_group': 'TRM_I', 'score': 0.54463005, 'word': '서비스로', 'start': 34, 'end': 38}, {'entity_group': 'DAT_B', 'score': 0.9937644, 'word': '2017년', 'start': 39, 'end': 44}, {'entity_group': '', 'score': 0.9995462, 'word': '런칭되었다.', 'start': 45, 'end': 51}]
pre

<br>

## 2. Deploy to Local Environment

---

SageMaker 호스팅 엔드포인트로 배포하기 전에 로컬 모드 엔드포인트로 배포할 수 있습니다. 로컬 모드는 현재 개발 중인 환경에서 도커 컨테이너를 실행하여 SageMaker 프로세싱/훈련/추론 작업을 에뮬레이트할 수 있습니다. 추론 작업의 경우는 Amazon ECR의 딥러닝 프레임워크 기반 추론 컨테이너를 로컬로 가져오고(docker pull) 컨테이너를 실행하여(docker run) 모델 서버를 시작합니다.

https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html

```python
local_model_path = f'{os.getcwd()}/model'
ecr_uri = image_uri

# 도커 컨테이너 구동
!docker run --name smmodel -itd -p 8080:8080 -v {local_model_path}:/opt/ml/model {ecr_uri} serve

# 실시간 호출 테스트 
!curl -X POST -H 'Content-Type: application/json' localhost:8080/invocations -d ...

# 도커 컨테이너 중지 및 삭제    
!docker stop smmodel
!docker rm smmodel
```

참고로 SageMaker SDK에서 `deploy(...)` 메소드로 엔드포인트 배포 시, 인스턴스 타입을 local 이나 local_gpu로 지정하면 위의 과정을 자동으로 수행할 수 있습니다.

```python
# 로컬 엔드포인트 배포
local_predictor = local_model.deploy(initial_instance_count=1, instance_type="local")

# 실시간 호출 테스트 
local_predictor.predict(...)

# 로컬 엔드포인트 삭제 (도커 컨테이너 중지 및 삭제)
local_predictor.delete_endpoint()
```

아래 코드를 보시면 아시겠지만, 지속적으로 업데이트되는 파이썬 버전&프레임워크 버전&트랜스포머 버전에 쉽게 대응할 수 있습니다. AWS에서 관리하고 있는 딥러닝 컨테이너(DLC) 목록을 아래 주소에서 확인해 보세요.

https://github.com/aws/deep-learning-containers/blob/master/available_images.md

In [12]:
import os
import time
import sagemaker
from sagemaker.pytorch import PyTorchModel
from sagemaker.serializers import JSONSerializer, JSONLinesSerializer, IdentitySerializer
from sagemaker.deserializers import JSONDeserializer, JSONLinesDeserializer
#from sagemaker.pytorch.model import PyTorchModel
role = sagemaker.get_execution_role()
endpoint_name = "local-endpoint-pytorch-{}".format(int(time.time()))
local_model_path = f'file://{os.getcwd()}/{local_model_dir}/model.tar.gz'

### Create Endpoint

SageMaker SDK는 `deploy(...)` 메소드를 호출 시, 엔드포인트 컨피그 생성(create-endpoint-config)과 엔드포인트 생성(create-endpoint)을 같이 수행합니다. 좀 더 세분화된 파라메터 조정을 원하면 AWS CLI나 boto3 SDK client 활용을 권장 드립니다.

In [13]:
model = PyTorchModel(
    model_data=local_model_path,
    role=role,
    entry_point='inference.py', 
    source_dir='scripts',
    framework_version='1.8.1',
    py_version='py3'
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='local',
    serializer=JSONLinesSerializer(),
    deserializer=JSONLinesDeserializer()
)

[{session.py:2668} INFO - Creating model with name: pytorch-inference-2022-06-13-08-53-27-217
[{session.py:3585} INFO - Creating endpoint-config with name pytorch-inference-2022-06-13-08-53-27-218
[{session.py:3053} INFO - Creating endpoint with name pytorch-inference-2022-06-13-08-53-27-218
[{image.py:270} INFO - serving
[{image.py:273} INFO - creating hosting dir in /tmp/tmpsvvw65dp
[{image.py:1012} INFO - No AWS credentials found in session but credentials from EC2 Metadata Service are available.
[{image.py:685} INFO - docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-zqdqk:
    command: serve
    container_name: cx2mwac59i-algo-1-zqdqk
    environment:
    - '[Masked]'
    - '[Masked]'
    - '[Masked]'
    - '[Masked]'
    image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.8.1-cpu-py3
    networks:
      sagemaker-local:
        aliases:
        - algo-1-zqdqk
    ports:
    - 8080:8080
    stdin_open: true
    tty:

[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 45
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 50
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 55
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 60
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 65
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 70
[{entities.py:635} INFO - Container still not up, got: -1
Attaching to cx2mwac59i-algo-1-zqdqk
[36mcx2mwac59i-algo-1-zqdqk |[0m Collecting transformers
[36mcx2mwac59i-algo-1-zqdqk |[0m   Downloading

[36mcx2mwac59i-algo-1-zqdqk |[0m   Downloading zipp-3.6.0-py3-none-any.whl (5.3 kB)
[36mcx2mwac59i-algo-1-zqdqk |[0m Collecting click
[36mcx2mwac59i-algo-1-zqdqk |[0m   Downloading click-8.0.4-py3-none-any.whl (97 kB)
     |████████████████████████████████| 97 kB 12.9 MB/s            
[36mcx2mwac59i-algo-1-zqdqk |[0m Building wheels for collected packages: sacremoses
[{entities.py:632} INFO - Checking if serving container is up, attempt: 75
[{entities.py:635} INFO - Container still not up, got: -1
[36mcx2mwac59i-algo-1-zqdqk |[0m   Building wheel for sacremoses (setup.py) ... [?25ldone
[36mcx2mwac59i-algo-1-zqdqk |[0m [?25h  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895259 sha256=7bf713231461fd434a535731f9896c7b2adfec471f7b3cbddf262e00c04817b3
[36mcx2mwac59i-algo-1-zqdqk |[0m   Stored in directory: /root/.cache/pip/wheels/4c/64/31/e9900a234b23fb3e9dc565d6114a9d6ff84a72dbdd356502b4
[36mcx2mwac59i-algo-1-zqdqk |[0m Successfully built

[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 90
[{entities.py:635} INFO - Container still not up, got: -1
[{entities.py:632} INFO - Checking if serving container is up, attempt: 95
[{entities.py:635} INFO - Container still not up, got: -1
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:09,059 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:09,107 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:09,339 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:09,345 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:09,352 [INFO ] main or

[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:12,706 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   1%|          | 7.05M/681M [00:00<00:09, 73.9MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:12,808 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   2%|▏         | 14.1M/681M [00:00<00:09, 74.2MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:12,908 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   3%|▎         | 21.2M/681M [00:00<00:09, 73.6MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:13,008 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   4%|▍         | 28.3M/681M [00:00<00:09, 73.9MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:13,108 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   5%|▌         | 35.4M/681M [00:00<00:09, 73.8MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:13,208 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:   6%|▋         | 42.6M/681M [00:00<00:08, 74.4MB/s]
[36

[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:17,851 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  57%|█████▋    | 389M/681M [00:05<00:03, 80.6MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:17,952 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  58%|█████▊    | 396M/681M [00:05<00:03, 80.5MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:18,051 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  59%|█████▉    | 404M/681M [00:05<00:03, 80.6MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:18,151 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  60%|██████    | 412M/681M [00:05<00:03, 80.7MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:18,251 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  62%|██████▏   | 420M/681M [00:05<00:03, 80.8MB/s]
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:55:18,352 [WARN ] W-9001-model_1-stderr MODEL_LOG - Downloading:  63%|██████▎   | 427M/681M [00:05<00:03, 80.8MB/s]
[36mcx2mw

모델 서빙을 위한 도커 컨테이너가 구동되고 있음을 확인할 수 있습니다.

In [15]:
!docker ps

CONTAINER ID   IMAGE                                                                          COMMAND                  CREATED         STATUS         PORTS                                                 NAMES
dc9d06b0ca46   763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.8.1-cpu-py3   "python /usr/local/b…"   3 minutes ago   Up 3 minutes   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp, 8081/tcp   cx2mwac59i-algo-1-zqdqk


### Sample data prediction

샘플 데이터로 추론을 수행합니다.

In [16]:
data = [
    {"text": ["아마존 SageMaker는 머신 러닝 통합 엔드투엔드 관리형 서비스로 2017년 re:Invent 행사가 열린 라스베가스에서 발표되었다."]}
]

In [17]:
results = predictor.predict(data)

[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:57:26,152 [INFO ] W-9002-model_1-stdout MODEL_LOG - input text: 아마존 SageMaker는 머신 러닝 통합 엔드투엔드 관리형 서비스로 2017년 re:Invent 행사가 열린 라스베가스에서 발표되었다.
[36mcx2mwac59i-algo-1-zqdqk |[0m 2022-06-13 08:57:26,627 [INFO ] W-9002-model_1-stdout MODEL_LOG - predicted_results: [{'entity_group': 'ORG_B', 'score': 0.97080106, 'word': '아마존', 'start': 0, 'end': 3}, {'entity_group': 'TRM_B', 'score': 0.74606895, 'word': 'SageMaker는 머신', 'start': 4, 'end': 17}, {'entity_group': 'TRM_I', 'score': 0.69509137, 'word': '러닝', 'start': 18, 'end': 20}, {'entity_group': '', 'score': 0.534267, 'word': '통합', 'start': 21, 'end': 23}, {'entity_group': 'TRM_I', 'score': 0.54091847, 'word': '엔드투엔드', 'start': 24, 'end': 29}, {'entity_group': '', 'score': 0.53203374, 'word': '관리형', 'start': 30, 'end': 33}, {'entity_group': 'TRM_I', 'score': 0.48312756, 'word': '서비스로', 'start': 34, 'end': 38}, {'entity_group': 'DAT_B', 'score': 0.9962328, 'word': '2017년', 'start': 39, 'end': 4

In [18]:
def display_ner_outputs(results, tag2entity):
    entity_lst, score_lst, word_lst, start_lst, end_lst = [], [], [], [], []
    tag2entity[''] = '-'

    for result in results:
        entity = tag2entity[result['entity_group']]
        score = result['score']
        word = result['word']
        start = result['start']
        end = result['end']

        entity_lst.append(entity)
        score_lst.append(score)
        word_lst.append(word)
        start_lst.append(start)
        end_lst.append(end)

    df = pd.DataFrame(zip(word_lst, entity_lst, score_lst, start_lst, end_lst), 
                      columns=['word', 'entity', 'score', 'start', 'end'])
    return df

In [19]:
local_model_dir = 'model'
with open(os.path.join(local_model_dir, 'tag2entity.json'), 'r') as f:
    tag2entity = json.loads(f.read())
display_ner_outputs(results, tag2entity)

Unnamed: 0,word,entity,score,start,end
0,아마존,Organization,0.970801,0,3
1,SageMaker는 머신,Term,0.746069,4,17
2,러닝,Term,0.695091,18,20
3,통합,-,0.534267,21,23
4,엔드투엔드,Term,0.540918,24,29
5,관리형,-,0.532034,30,33
6,서비스로,Term,0.483128,34,38
7,2017년,Date,0.996233,39,44
8,re :,Event,0.802971,45,48
9,Invent 행사가,Event,0.731569,48,58


In [20]:
predictor.delete_endpoint()

[{session.py:3113} INFO - Deleting endpoint configuration with name: pytorch-inference-2022-06-13-08-53-27-218
[{session.py:3103} INFO - Deleting endpoint with name: pytorch-inference-2022-06-13-08-53-27-218
Gracefully stopping... (press Ctrl+C again to force)


<br>

## 3. Deploy to Hosting Instance
---
로컬 모드에서 충분히 디버깅했으면 실제 호스팅 인스턴스로 배포할 차례입니다. 코드는 거의 동일하며, instance_type만 다르다는 점을 주목해 주세요!

### Create Endpoint

In [23]:
from sagemaker.pytorch import PyTorchModel
from sagemaker.serializers import JSONSerializer, JSONLinesSerializer, IdentitySerializer
from sagemaker.deserializers import JSONDeserializer, JSONLinesDeserializer

model = PyTorchModel(
    model_data=f"{s3_model_path}",  # path to your trained SageMaker model
    role=role,                      # IAM role with permissions to create an endpoint   
    entry_point='inference.py',
    source_dir='scripts',
    framework_version="1.8.1",      # PyTorch version used
    py_version='py3',              # Python version used
)
predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.c5.xlarge",
    serializer=JSONLinesSerializer(),
    deserializer=JSONLinesDeserializer(), 
    wait=False
)

[{session.py:2668} INFO - Creating model with name: pytorch-inference-2022-06-13-08-59-04-856
[{session.py:3585} INFO - Creating endpoint-config with name pytorch-inference-2022-06-13-08-59-05-123
[{session.py:3053} INFO - Creating endpoint with name pytorch-inference-2022-06-13-08-59-05-123


In [None]:
# from sagemaker.huggingface.model import HuggingFaceModel
# model = HuggingFaceModel(
#     model_data=f"{s3_model_path}",  # path to your trained SageMaker model
#     role=role, 
#     transformers_version="4.6",
#     pytorch_version="1.7",

#     #entry_point='inference.py',
#     source_dir='scripts',
#     #framework_version="1.8.1",                              # PyTorch version used
#     py_version='py36',                                    # Python version used
# )

# predictor = model.deploy(
#     initial_instance_count=1,
#     instance_type="ml.c5.xlarge",
#     wait=False
# )

### Wait for the endpoint jobs to complete
엔드포인트가 생성될 때까지 기다립니다. 엔드포인트가 가리키는 호스팅 리소스를 프로비저닝하는 데에 몇 분의 시간이 소요됩니다.

In [24]:
from IPython.core.display import display, HTML
def make_endpoint_link(region, endpoint_name, endpoint_task):
    endpoint_link = f'<b><a target="blank" href="https://console.aws.amazon.com/sagemaker/home?region={region}#/endpoints/{endpoint_name}">{endpoint_task} Review Endpoint</a></b>'   
    return endpoint_link 
        
endpoint_link = make_endpoint_link(region, predictor.endpoint_name, '[Deploy model from S3]')
display(HTML(endpoint_link))

In [25]:
sess.wait_for_endpoint(predictor.endpoint_name, poll=5)

--------------------------------------!

{'EndpointName': 'pytorch-inference-2022-06-13-08-59-05-123',
 'EndpointArn': 'arn:aws:sagemaker:us-east-1:143656149352:endpoint/pytorch-inference-2022-06-13-08-59-05-123',
 'EndpointConfigName': 'pytorch-inference-2022-06-13-08-59-05-123',
 'ProductionVariants': [{'VariantName': 'AllTraffic',
   'DeployedImages': [{'SpecifiedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.8.1-cpu-py3',
     'ResolvedImage': '763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference@sha256:ea8411872129fd66a7712a2d21564b82cb165628bd534ce3a587d3c1ec6241cd',
     'ResolutionTime': datetime.datetime(2022, 6, 13, 8, 59, 7, 370000, tzinfo=tzlocal())}],
   'CurrentWeight': 1.0,
   'DesiredWeight': 1.0,
   'CurrentInstanceCount': 1,
   'DesiredInstanceCount': 1}],
 'EndpointStatus': 'InService',
 'CreationTime': datetime.datetime(2022, 6, 13, 8, 59, 5, 398000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2022, 6, 13, 9, 2, 14, 93000, tzinfo=tzlocal()),
 'ResponseMeta

### Sample data prediction

샘플 데이터로 추론을 수행합니다.

In [26]:
data = [
    {"text": ["아마존 SageMaker는 머신 러닝 통합 엔드투엔드 관리형 서비스로 2017년 re:Invent 행사가 열린 라스베가스에서 발표되었다."]}
]
results = predictor.predict(data)
display_ner_outputs(results, tag2entity)

Unnamed: 0,word,entity,score,start,end
0,아마존,Organization,0.970801,0,3
1,SageMaker는 머신,Term,0.746069,4,17
2,러닝,Term,0.695091,18,20
3,통합,-,0.534267,21,23
4,엔드투엔드,Term,0.540918,24,29
5,관리형,-,0.532034,30,33
6,서비스로,Term,0.483128,34,38
7,2017년,Date,0.996233,39,44
8,re :,Event,0.802971,45,48
9,Invent 행사가,Event,0.731569,48,58


## Clean up 

비용 과금 방지를 위해 엔드포인트를 삭제합니다.

In [28]:
predictor.delete_endpoint()
model.delete_model()

[{session.py:3113} INFO - Deleting endpoint configuration with name: pytorch-inference-2022-06-13-08-59-05-123
[{session.py:3103} INFO - Deleting endpoint with name: pytorch-inference-2022-06-13-08-59-05-123
