# ELECTRA

TensorFlow 2.1 implementation of pretraining and finetuning scripts for ELECTRA.

The original paper: [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555)

* We pretrain 125000 steps with a total batch size of 1024 for maximum sequence length of 512 across 8 p3dn.24xlarge nodes.
* We finetune SQuAD v2.0 5430 steps with a total batch size of 48 on a single p3dn.24xlarge node.

SQuAD F1 score combines both precision and recall of each word in the predicted answer ranging between 0-100.

| Model | Total Training Time | SQuAD v2.0 EM | SQuAD v2.0 F1 |
| --- | --- | --- | --- |
| ELECTRA-small | 11 hrs 40 min | 68.27 | 71.56 |

### How To Launch Training

All commands should be run from the `models/nlp` directory.

1. Create an FSx volume.

2. Download the datasets onto FSx. The simplest way to start is with English Wikipedia.

3. Create an Amazon Elastic Container Registry (ECR) repository. Then build a Docker image from `models/nlp/Dockerfile` and push it to ECR.

```bash
export ACCOUNT_ID=
export REPO=
export IMAGE=${ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${REPO}:py37_tf211
docker build -t ${IMAGE} .

# AWS-CLI v1
$(aws ecr get-login --no-include-email)
# AWS-CLI v2
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com

docker push ${IMAGE}
```

4. Define environment variables to point to the FSx volume. For a list, use a comma-separated string.

```bash
export SAGEMAKER_ROLE=arn:aws:iam::${ACCOUNT_ID}:role/service-role/AmazonSageMaker-ExecutionRole-20200101T123
export SAGEMAKER_IMAGE_NAME=${IMAGE}
export SAGEMAKER_FSX_ID=fs-123
export SAGEMAKER_SUBNET_IDS=subnet-123
export SAGEMAKER_SECURITY_GROUP_IDS=sg-123,sg-456
```

5. Define ELECTRA-specific run names.

```bash
export RUN_NAME=myelectrapretraining
export TOTAL_STEPS=125000
# The data should be in TFRecords inside $TRAIN_DIR on the FSx volume
export TRAIN_DIR=electra_data/train
export VAL_DIR=electra_data/val
export LOG_DIR=logs/electra
export CHECKPOINT_DIR=checkpoints/electra
```
6. Launch the SageMaker Electra pretraining.

```bash
python -m albert.launch_sagemaker \
    --source_dir=. \
    --entry_point=electra/run_pretraining.py \
    --sm_job_name=electra-pretrain \
    --instance_type=ml.p3dn.24xlarge \
    --instance_count=8 \
    --train_dir=${TRAIN_DIR} \
    --val_dir=${VAL_DIR} \
    --log_dir=${LOG_DIR} \
    --checkpoint_dir=${CHECKPOINT_DIR} \
    --load_from=scratch \
    --model_type=electra \
    --model_size=small \
    --attention_probs_dropout_prob=0.1 \
    --hidden_dropout_prob=0.1 \
    --checkpoint_frequency=10000 \
    --per_gpu_batch_size=16 \
    --max_seq_len=512 \
    --learning_rate=2e-3 \
    --end_learning_rate=4e-4 \
    --weight_decay=0.01 \
    --warmup_steps=10000 \
    --validation_frequency=10000 \
    --total_steps=${TOTAL_STEPS} \
    --log_frequency=2000 \
    --run_name=${RUN_NAME} \
    --name=myelectra
```

7. Launch a SageMaker finetuning job.


SQuAD v2.0

```bash
python -m albert.launch_sagemaker \
    --source_dir=. \
    --entry_point=albert/run_squad.py \
    --sm_job_name=electra-squadv2 \
    --instance_type=ml.p3dn.24xlarge \
    --instance_count=1 \
    --load_from=checkpoint \
    --checkpoint_path=checkpoints/electra/${RUN_NAME}-step${TOTAL_STEPS}-discriminator \
    --model_type=electra \
    --model_size=small \
    --per_gpu_batch_size=6 \
    --weight_decay=0 \
    --model_size=small \
    --squad_version=squadv2 \
    --learning_rate=40e-5 \
    --warmup_steps=543 \
    --total_steps=5430 \
    --validation_frequency=50000 \
    --evaluate_frequency=50000 \
    --skip_xla=true
```

8. Enter the Docker container to debug and edit code.

```bash
docker run -it --privileged -v=/fsx:/fsx --gpus=all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --rm ${IMAGE} /bin/bash
```

### Command-Line Parameters
See [common/arguments.py](common/arguments.py) for full details.