# OpenCALM SageMaker Inference for JAQKET dataset

[OpenCALM](https://huggingface.co/spaces/kyo-takano/OpenCALM-7B) を SageMaker の推論エンドポイントに Hosting し、[JAQKET](https://www.nlp.ecei.tohoku.ac.jp/projects/jaqket/) の評価データセットについて回答を得る Notebook です。

以下の環境で Hosting し動作確認を行ってます。

* `ml.g5.2xlarge(NVIDIA A10G Tensor Core GPU 搭載 VRAM 24GB, RAM 32GB, vCPU 8)` : `PyTorch 1.13 Python 3.9 GPU Optimized`
 
[各インスタンスの料金についてはこちら](https://aws.amazon.com/jp/sagemaker/pricing/)をご確認ください。

In [None]:
!pip install "sagemaker>=2.143.0" -U

In [None]:
!pip install tqdm

In [None]:
import sagemaker, boto3, json
from sagemaker import get_execution_role
from sagemaker.pytorch.model import PyTorchModel
from sagemaker.huggingface import HuggingFace

role = get_execution_role()
region = boto3.Session().region_name
sess = sagemaker.Session()
bucket = sess.default_bucket()

sagemaker.__version__

## Package and Upload Model

In [None]:
!rm -rf scripts/model
%cd scripts
!tar -czvf ../package.tar.gz *
%cd -

In [None]:
model_path = sess.upload_data("package.tar.gz", bucket=bucket, key_prefix=f"OpenCALM")
model_path

## Deploy Model

In [None]:
model_name = "cyberagent/open-calm-7b"
model_name_base = model_name.split("/")[-1]

In [None]:
from sagemaker.serializers import JSONSerializer

huggingface_model = PyTorchModel(
    model_data=model_path,
    framework_version="1.13",
    py_version="py39",
    role=role,
    name=model_name_base,
    env={
        "model_params": json.dumps(
            {
                "base_model": model_name,
                "peft": False,
                "load_8bit": False,
                "prompt_template": "simple_qa_ja",
            }
        ),
        "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600"
    },
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type='ml.g5.2xlarge',
    endpoint_name=model_name_base,
    serializer=JSONSerializer()
)

## Run Inference

In [None]:
from sagemaker.predictor import Predictor
from sagemaker.predictor_async import AsyncPredictor
from sagemaker.deserializers import JSONDeserializer

predictor_client = Predictor(
    endpoint_name=model_name_base,
    sagemaker_session=sess,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

In [None]:
import re


def inference(instruction):
    data = {
        "instruction": instruction,
        "input": "",
        "max_new_tokens": 32,
        "temperature": 0.1,
        "do_sample": False,
        "num_beams": 5,
        "pad_token_id": 1,
        "bos_token_id": 0,
        "eos_token_is": 0,
        # "repetition_penalty": 1.05,
        "stop_ids": [1, 0],
    }
    response = predictor_client.predict(data=data)
    answer = ""
    try:
        answer = re.findall("「(.*?)」", f"「{response}")[-1]
    except IndexError:
        answer = response
    return answer

In [None]:
print(inference("映画『ウエスト・サイド物語』に登場する2つの少年グループといえば、シャーク団と何団?"))

JAQKET データセットをダウンロード。

In [None]:
!wget -P data https://jaqket.s3.ap-northeast-1.amazonaws.com/data/aio_02/aio_02_dev_v1.0.jsonl

In [None]:
import pandas as pd
from tqdm import tqdm


df = pd.read_json("data/aio_02_dev_v1.0.jsonl", orient="records", lines=True)

llm_answers = []
matches = []
for idx, row in tqdm(df.iterrows()):
    llm_answer = inference(row["question"])
    llm_answers += [llm_answer]
    matches += [llm_answer in row["answers"]]


df["llm_answers"] = pd.Series(llm_answers)
df["match"] = pd.Series(matches)

In [None]:
print(df.match.sum(), "/", len(df))

In [None]:
df.to_csv(f"data/{model_name_base}_inference.csv", index=False)

## Delete Endpoint

In [None]:
predictor.delete_model()
predictor.delete_endpoint()