## FastQC Batch Workshop

In this workshop we will develop an AWS Batch environment to submit FastQC jobs to levraging cloud native job scheduling services with [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html). We will be leveraging the [AWS Open Data Registry](https://registry.opendata.aws/) to use the [1000 Genomes](https://registry.opendata.aws/1000-genomes/) to execute FastQC against a [FASTQ formatted](https://en.wikipedia.org/wiki/FASTQ_format) file from the data set.

## **If multiple users are running in the same account update `workshop_user` with your unique username to help avoid collisions**

In [None]:
import boto3
import botocore
import json
import time
import os
import base64
import docker
import pandas as pd

import project_path # path to helper methods
from lib import workshop
from botocore.exceptions import ClientError

ecr = boto3.client('ecr')
cfn = boto3.client('cloudformation')
ec2_client = boto3.client('ec2')
batch = boto3.client('batch')
iam = boto3.client('iam')
ssm = boto3.client('ssm')

session = boto3.session.Session()
region = session.region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

workshop_user = 'hpc' # no capitals all lower case
batch_sec_group_name = 'FastQBatchSG_' + workshop_user
repo = 'fastqc_demo_' + workshop_user
job_def_name = 'fastqc_demo_job_' + workshop_user
instance_profile_name = 'FastQInstanceProfile_' + workshop_user
iam_stack_name = 'FastQCIAMRolesStack-' + workshop_user 
default_env = 'FastQCEnvironment' + '_' + workshop_user
bid_percentage = 100
desired_cpu = 4

use_existing = True

### [Create VPC](https://docs.aws.amazon.com/vpc/index.html) 

In order to simulate a Greengrass device on an EC2 instance we will create a new VPC with a public subnet by running the code below. As you can see to make a subnet public an Internet Gateway is attached to the VPC and a routing table is created with and entry to route all traffic at `0.0.0.0/0` to the Internet gateway. We will store the VPC and Subnet Id's to be used later in the notebook.

In [None]:
if use_existing:
 vpc_filter = [{'Name':'isDefault', 'Values':['true']}]
 default_vpc = ec2_client.describe_vpcs(Filters=vpc_filter)
 vpc_id = default_vpc['Vpcs'][0]['VpcId']

 subnet_filter = [{'Name':'vpc-id', 'Values':[vpc_id]}]
 subnets = ec2_client.describe_subnets(Filters=subnet_filter)
 subnet1_id = subnets['Subnets'][0]['SubnetId']
 subnet2_id = subnets['Subnets'][1]['SubnetId']
else: 
 vpc, subnet1, subnet2 = workshop.create_and_configure_vpc()
 vpc_id = vpc.id
 subnet1_id = subnet1.id
 subnet2_id = subnet2.id

In [None]:
print(vpc_id)
print(subnet1_id)
print(subnet2_id)
print(region)

### [Create S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)

We will create an S3 bucket that will be used throughout the workshop for storing our data.

[s3.create_bucket](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.create_bucket) boto3 documentation

In [None]:
bucket = workshop.create_bucket_name('batch-')
#bucket = workshop.create_bucket(region, session, "batch-fastq-hpc", with_uuid=False)
#session.resource('s3').create_bucket(Bucket=bucket, CreateBucketConfiguration={'LocationConstraint': region})
session.resource('s3').create_bucket(Bucket=bucket)

print(bucket)

### Create the bash script to run in container

Create the shell script used to run fastqc and send the output to our S3 bucket for analysis. Replace the **`{{bucket}}`** and **`{{region}}`** variables. This script runs the fastqc process on the fastq file from 1000 genomes data set and sends the results to an S3 bucket for further inspection.

In [None]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
 with open(line, 'w+') as f:
 f.write(cell.format(**globals()))


In [None]:
%%writetemplate fastqc.sh
#! /bin/bash
aws s3 cp $1 .
filename=$(basename $1)
fastqc $filename
report=$(ls *.html)
aws s3 mv *.zip s3://{bucket} --acl public-read
aws s3 mv *.html s3://{bucket} --acl public-read
rm $filename
echo OUTPUT: https://s3.{region}.amazonaws.com/{bucket}/$report

### Create the Dockerfile for FastQC

Note: the CMD line will be overriden during docker run with the desired filename as the input parameter. ENTRYPOINT, however ,can not be overriden. 

In [None]:
%%writefile Dockerfile
FROM biocontainers/fastqc:v0.11.5_cv3
USER root
ADD fastqc.sh /home/biodocker/bin/fastqc.sh
RUN chown -v biodocker /home/biodocker/bin/fastqc.sh && chmod -v 764 /home/biodocker/bin/fastqc.sh && pip install awscli
USER biodocker
ENV PATH /home/biodocker/.local/bin:$PATH
CMD fastqc.sh s3://1000genomes/phase3/data/NA21144/sequence_read/ERR047877.filt.fastq.gz
 

### [Create the ECR Repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/Repositories.html)

Amazon Elastic Container Registry (Amazon ECR) provides API operations to create, monitor, and delete image repositories and set permissions that control who can access them. You can perform the same actions in the Repositories section of the Amazon ECR console. Amazon ECR also integrates with the Docker CLI allowing you to push and pull images from your development environments to your repositories.

[ecr.create_repository](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecr.html#ECR.Client.create_repository)

In [None]:
try:
 response = ecr.create_repository(
 repositoryName=repo
 )
except ClientError as e:
 if e.response['Error']['Code'] == 'RepositoryAlreadyExistsException':
 print("Repo exists, skip")
 else:
 raise e

### Build container image and upload to ECR

In order to work with ECR you need to retrieve a token, and that token is valid for a specified registry for 12 hours. This command allows you to use the `docker` CLI to push and pull images with Amazon ECR. If you do not specify a registry, the default registry is assumed.

We will use the [Docker SDK for Python](https://docker-py.readthedocs.io/en/stable/) to build and push the image to the ECR repository.

[ecr.get_authorization_token](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecr.html#ECR.Client.get_authorization_token)

Note: Sagemaker notebook runs on instances that already has docker installed and dockerd runing. This part will not work if you are using SageMaker Studio, where docker command is not installed. You will have to use AWS CodeBuild to build and register the image to ECR. 

In [None]:
login = ecr.get_authorization_token()
b64token = login['authorizationData'][0]['authorizationToken'].encode('utf-8')
username, password = base64.b64decode(b64token).decode('utf-8').split(':')
registry = login['authorizationData'][0]['proxyEndpoint']

client = docker.from_env()
client.login(username, password, registry=registry)

img, logs = client.images.build(path='.', tag=repo)
registry_with_name = registry.replace('https://', '') + '/' + repo
print(registry_with_name)
img.tag(registry_with_name, tag='latest')
client.images.push(registry_with_name, tag='latest')

In [None]:
print('https://{0}.console.aws.amazon.com/ecr/repositories/{1}/?region={0}'.format(region, repo))

### [Create the IAM roles required for AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/IAM_policies.html)

By default, IAM users don't have permission to create or modify AWS Batch resources, or perform tasks using the AWS Batch API. This means that they also can't do so using the AWS Batch console or the AWS CLI. To allow IAM users to create or modify resources and submit jobs, you must create IAM policies that grant IAM users permission to use the specific resources and API actions they need. Then, attach those policies to the IAM users or groups that require those permissions.

### Upload [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/GettingStarted.html) template

In the interest of time we will generate the IAM Roles required with a CloudFormation template.

In [None]:
!cat fastq-batch-roles.yaml

In [None]:
iam_file = 'fastq-batch-roles.yaml'
session.resource('s3').Bucket(bucket).Object(os.path.join('cfn', iam_file)).upload_file(iam_file)

### Execute CloudFormation Stack for IAM Roles

Creates a stack as specified in the template. After the call completes successfully, the stack creation starts. You can check the status of the stack via the DescribeStacks API.

[cfn.create_stack](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/cloudformation.html#CloudFormation.Client.create_stack)

In [None]:
# The cf template we use needs AmazonEC2SpotFleetRole, ref: https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html#spot-fleet-roles-console
iam_client = session.client('iam')

try:
 resp = iam_client.create_role(RoleName='AmazonEC2SpotFleetRole',
 AssumeRolePolicyDocument='{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":"spotfleet.amazonaws.com"},"Action":"sts:AssumeRole"}]}')
except ClientError as e:
 if e.response['Error']['Code'] == 'EntityAlreadyExists':
 print('AmazonEC2SpotFleetRole already exists, ignore')
 else: 
 raise e

try:
 resp = iam_client.attach_role_policy(PolicyArn='arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetTaggingRole',
 RoleName='AmazonEC2SpotFleetRole')
except ClientError as e:
 if e.response['Error']['Code'] == 'EntityAlreadyExists':
 print('AmazonEC2SpotFleetRole already exists, ignore')
 else: 
 raise e

try:
 resp = iam_client.create_service_linked_role(AWSServiceName='spot.amazonaws.com')
except ClientError as e:
 if e.response['Error']['Code'] == 'InvalidInput':
 print('Linked roles already exist, ignore') 
 else: 
 raise 

try:
 resp = iam_client.create_service_linked_role(AWSServiceName='spot.amazonaws.com')
except ClientError as e:
 if e.response['Error']['Code'] == 'InvalidInput':
 print('Linked roles already exist, ignore') 
 else: 
 raise 

resp = iam_client.get_role(RoleName='AmazonEC2SpotFleetRole')
batch_spot_role_arn = resp['Role']['Arn']

In [None]:
# change to useing TemplateBody, instead of pulling from S3
#cfn_template = 'https://s3-{0}.amazonaws.com/{1}/cfn/{2}'.format(region, bucket, iam_file)
#print(cfn_template)

with open(iam_file) as tf:
 template_data = tf.read()
 cfn.validate_template(TemplateBody=template_data)
 
 response = cfn.create_stack(
 StackName=iam_stack_name,
# TemplateURL=cfn_template,
 TemplateBody=template_data,
 Capabilities = ["CAPABILITY_NAMED_IAM"],
 Parameters=[
 {
 'ParameterKey': 'S3Bucket',
 'ParameterValue': bucket
 }
 ]
 )

 print(response)

In [None]:
print('waiting for stack complete...')
waiter = cfn.get_waiter('stack_create_complete')
waiter.wait(
 StackName=iam_stack_name
)
print('stack complete.')

### [Get Outputs of the CloudFormation template](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/outputs-section-structure.html)

The optional `Outputs` section declares output values that you can import into other stacks, return in response, or view on the AWS CloudFormation console. We provide outputs for the `Name` and `ARN`s for the requires roles for AWS Batch services.

In [None]:
response = cfn.describe_stacks(StackName=iam_stack_name)

outputs = response["Stacks"][0]["Outputs"]

for output in response['Stacks'][0]['Outputs']:
 if (output['OutputKey'] == 'BatchTaskRole'):
 batch_task_role = output['OutputValue']
 if (output['OutputKey'] == 'BatchTaskRoleArn'):
 batch_task_role_arn = output['OutputValue']
 if (output['OutputKey'] == 'BatchInstanceRole'):
 batch_instance_role = output['OutputValue']
 if (output['OutputKey'] == 'BatchInstanceRoleArn'):
 batch_instance_role_arn = output['OutputValue']
 if (output['OutputKey'] == 'BatchServiceRole'):
 batch_service_role = output['OutputValue']
 if (output['OutputKey'] == 'BatchServiceRoleArn'):
 batch_service_role_arn = output['OutputValue']
# if (output['OutputKey'] == 'BatchSpotFleetRole'):
# batch_spot_role = output['OutputValue']
# if (output['OutputKey'] == 'BatchSpotFleetRoleArn'):
# batch_spot_role_arn = output['OutputValue']

pd.set_option('display.max_colwidth', -1)
pd.DataFrame(outputs, columns=["OutputKey", "OutputValue"])

### [Create Instance Profile for Batch instances](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html)

An instance profile is a container for an IAM role that you can use to pass role information to an EC2 instance when the instance starts.

[iam.create_instance_profile](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam.html#IAM.Client.create_instance_profile)

In [None]:
iam.create_instance_profile(
 InstanceProfileName=instance_profile_name
)

### Associate IAM Role with instance profile

[iam.add_role_to_instance_profile](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/iam.html#IAM.Client.add_role_to_instance_profile)

In [None]:
iam.add_role_to_instance_profile(
 InstanceProfileName=instance_profile_name,
 RoleName=batch_instance_role
)

### [Create Security Group](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html)


A security group acts as a virtual firewall for your instance to control inbound and outbound traffic. When you launch an instance in a VPC, you can assign up to five security groups to the instance. Security groups act at the instance level, not the subnet level. Therefore, each instance in a subnet in your VPC could be assigned to a different set of security groups. If you don't specify a particular group at launch time, the instance is automatically assigned to the default security group for the VPC.

[ec2_client.create_security_group](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.create_security_group) boto3 documentation

In [None]:
try:
 sg = ec2_client.create_security_group(
 Description='security group for Compute Environment',
 GroupName=batch_sec_group_name,
 VpcId=vpc_id
 )
 batch_sec_group_id=sg["GroupId"]
except ClientError as e:
 if e.response['Error']['Code'] == 'InvalidGroup.Duplicate':
 print("SG already exists, ")
 resp = ec2_client.describe_security_groups(Filters=[dict(Name='group-name', Values=[batch_sec_group_name])])
 batch_sec_group_id = resp['SecurityGroups'][0]['GroupId']
 
print('Batch security group id - ' + batch_sec_group_name)
print(batch_sec_group_id)

### [Create the Batch Environment](https://docs.opendata.aws/genomics-workflows/aws-batch/configure-aws-batch-cfn/)

We will create the required AWS Batch environment for genomics workflows in the next few cells. This will be used to submit job requests to for the FastQC container.

[batch.create_compute_environment](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.create_compute_environment)

In [None]:
def create_compute_environment(computeEnvironmentName, computeType, unitVCpus, imageId, serviceRole, instanceRole,
 subnets, securityGroups, bidPercentage=None, spotFleetRole=None):
 
 compute_resources = {
 'type': computeType,
 'imageId': imageId,
 'minvCpus': unitVCpus * 1,
 'maxvCpus': unitVCpus * 16,
 'desiredvCpus': unitVCpus * 1,
 'instanceTypes': ['optimal'],
 'subnets': subnets,
 'securityGroupIds': securityGroups,
 'instanceRole': instanceRole
 }
 
 if computeType == 'SPOT':
 compute_resources = {
 'type': computeType,
 'imageId': imageId,
 'minvCpus': unitVCpus * 1,
 'maxvCpus': unitVCpus * 16,
 'desiredvCpus': unitVCpus * 1,
 'instanceTypes': ['optimal'],
 'subnets': subnets,
 'securityGroupIds': securityGroups,
 'instanceRole': instanceRole,
 'bidPercentage': bidPercentage,
 'spotIamFleetRole': spotFleetRole,
 }
 
 response = batch.create_compute_environment(
 computeEnvironmentName=computeEnvironmentName,
 type='MANAGED',
 serviceRole=serviceRole,
 computeResources=compute_resources
 )

 while True:
 describe = batch.describe_compute_environments(computeEnvironments=[computeEnvironmentName])
 computeEnvironment = describe['computeEnvironments'][0]
 status = computeEnvironment['status']
 if status == 'VALID':
 print('\rSuccessfully created compute environment {}'.format(computeEnvironmentName))
 break
 elif status == 'INVALID':
 reason = computeEnvironment['statusReason']
 raise Exception('Failed to create compute environment: {}'.format(reason))
 print('\rCreating compute environment...')
 time.sleep(5)
 
 return response

### Get Latest [Amazon Linux AMI](https://aws.amazon.com/amazon-linux-ami/)

The Amazon Linux 2 AMI is a supported and maintained Linux image provided by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for applications running on Amazon EC2. It supports the latest EC2 instance type features and includes packages that enable easy integration with AWS. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Amazon Linux AMI is provided at no additional charge to Amazon EC2 users. The specific AMI we are using is teh ECS optimized version that is needed for AWS Batch.

In [None]:
response = ssm.get_parameters(Names=['/aws/service/ecs/optimized-ami/amazon-linux-2/recommended'])
ami = json.loads(response['Parameters'][0]['Value'])['image_id']
print(ami)

### [Create Batch Compute Environment](https://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html)

Compute environments contain the Amazon ECS container instances that are used to run containerized batch jobs. A given compute environment can also be mapped to one or many job queues. Within a job queue, the associated compute environments each have an order that is used by the scheduler to determine where to place jobs that are ready to be executed.

[batch.create_compute_environment](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.create_compute_environment)

In [None]:
security_groups = [batch_sec_group_id]

resp = create_compute_environment(default_env, 'SPOT', desired_cpu, ami, batch_service_role_arn, instance_profile_name, \
 [subnet1_id], security_groups, bid_percentage, batch_spot_role_arn)

default_ce_arn = resp['computeEnvironmentArn']
default_ce = resp['computeEnvironmentName']
print(default_ce_arn)

### [Create the AWS Batch Job Queue](https://docs.aws.amazon.com/batch/latest/userguide/create-job-queue.html)

Jobs are submitted to a job queue, where they reside until they are able to be scheduled to run in a compute environment. An AWS account can have multiple job queues. For example, you might create a queue that uses Amazon EC2 On-Demand instances for high priority jobs and another queue that uses Amazon EC2 Spot Instances for low-priority jobs. Job queues have a priority that is used by the scheduler to determine which jobs in which queue should be evaluated for execution first.

[batch.create_job_queue](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.create_job_queue)

In [None]:
def create_job_queue(computeEnvironmentName, workshopUser, priority):
 jobQueueName = computeEnvironmentName + '_queue'
 response = batch.create_job_queue(jobQueueName=jobQueueName,
 priority=priority,
 computeEnvironmentOrder=[
 {
 'order': 1,
 'computeEnvironment': computeEnvironmentName
 }
 ])

 while True:
 describe = batch.describe_job_queues(jobQueues=[jobQueueName])
 jobQueue = describe['jobQueues'][0]
 status = jobQueue['status']
 if status == 'VALID':
 print('\rSuccessfully created job queue {}'.format(jobQueueName))
 break
 elif status == 'INVALID':
 reason = jobQueue['statusReason']
 raise Exception('Failed to create job queue: {}'.format(reason))
 print('\rCreating job queue... ')
 time.sleep(5)

 return response

In [None]:
resp = create_job_queue(default_env, workshop_user, 1)
fastq_queue_arn = resp['jobQueueArn']
fastq_queue = resp['jobQueueName']
print(fastq_queue_arn)

### [Create AWS Batch Job Definition](https://docs.aws.amazon.com/batch/latest/userguide/create-job-definition.html)

AWS Batch job definitions specify how jobs are to be run. While each job must reference a job definition, many of the parameters that are specified in the job definition can be overridden at runtime.

[batch.register_job_definition](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/batch.html#Batch.Client.register_job_definition)

In [None]:
job_def = batch.register_job_definition(
 jobDefinitionName=job_def_name,
 type='container',
 parameters={
 'InputFile': 's3://1000genomes/phase3/data/NA21144/sequence_read/ERR047877.filt.fastq.gz'
 },
 containerProperties={
 'image': registry_with_name,
 'vcpus': 1,
 'memory': 512,
 'command': [
 'fastqc.sh', 
 'Ref::InputFile'
 ],
 'jobRoleArn': batch_task_role_arn
 }
)

print(job_def)

### Submit Job from the console

We will use the console to submit the job, but you can also use the CLI and SDKs. On the link below click `Create Job` and fill in the parameters below:

* Job name: `FASTQCJobDemo`
* Job definition: `fastqc_demo_job:1`
* Job queue: `DefaultFastQCEnvironment_queue`
* Job type: `Single`
* Job attempts: `1`

All other settings can be left as the defaults. For the parameters:

* InputFile: `s3://1000genomes/phase3/data/NA21144/sequence_read/ERR047877.filt.fastq.gz`

And finally, for the command it should be populated but should contain:

* `fastqc.sh Ref::InputFile` in the space delimited section.

In [None]:
print('https://{0}.console.aws.amazon.com/batch/home?region={0}#/jobs/new'.format(region))

### View results of FastQC job 

In [None]:
print('https://s3.{0}.amazonaws.com/{1}/ERR047877.filt_fastqc.html'.format(region, bucket))

### Submit multiple files from the CLI

You can use boto3 api or batch CLI to submit the jobs


In [None]:
files_to_process = ['ERR047877', 'ERR047878', 'ERR047879', 'ERR048950', 'ERR048951', 'ERR048952', 'ERR251691', 'ERR251692']
for fn in files_to_process:
 print(f"Submit {fn}")
 sb_resp = batch.submit_job(jobName=f"FastQC-API-{fn}", 
 jobQueue=fastq_queue, 
 jobDefinition=job_def_name, 
 parameters={'InputFile':f"s3://1000genomes/phase3/data/NA21144/sequence_read/{fn}.filt.fastq.gz"})
 print(sb_resp['jobName'], sb_resp['jobId'] )

In [None]:
!aws batch submit-job --job-name FastQC-CLI1 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR047877.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI2 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR047878.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI3 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR047879.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI4 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR048950.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI5 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR048951.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI6 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR048952.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI7 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR251691.filt.fastq.gz
!aws batch submit-job --job-name FastQC-CLI8 --job-queue $fastq_queue --job-definition $job_def_name --parameters InputFile=s3://1000genomes/phase3/data/NA21144/sequence_read/ERR251692.filt.fastq.gz

### Monitor results of the jobs in the AWS Batch Dashboard

In [None]:
print('https://{0}.console.aws.amazon.com/batch/home?region={0}#/dashboard'.format(region))

## Cleanup

In [None]:
def delete_compute_environment(computeEnvironment):
 response = batch.update_compute_environment(
 computeEnvironment=computeEnvironment,
 state='DISABLED',
 )
 
 while True:
 response = batch.describe_compute_environments(
 computeEnvironments=[computeEnvironment])
 assert len(response['computeEnvironments']) == 1
 env = response['computeEnvironments'][0]
 state = env['state']
 status = env['status']
 if status == 'UPDATING':
 print("Environment %r is updating, waiting..." % (computeEnvironment,))

 elif state == 'DISABLED':
 break

 else:
 raise RuntimeError('Expected status=UPDATING or state=DISABLED, '
 'but status=%r and state=%r' % (status, state))

 # wait a little bit before checking again.
 time.sleep(15)
 
 response = batch.delete_compute_environment(
 computeEnvironment=computeEnvironment
 )

 time.sleep(5)
 response = describe_compute_environments([computeEnvironment])
 
 while response['computeEnvironments'][0]['status'] == 'DELETING':
 time.sleep(5)
 response = describe_compute_environments([computeEnvironment])
 if len(response['computeEnvironments']) != 1:
 break
 
 return response


def describe_compute_environments(compute_envs):
 try:
 response = batch.describe_compute_environments(
 computeEnvironments=compute_envs,
 )
 except ClientError as e:
 print(e.response['Error']['Message'])
 raise

 return response


def delete_job_queue(job_queue):
 job_queues = [job_queue]
 response = describe_job_queues(job_queues)
 
 if response['jobQueues'][0]['state'] != 'DISABLED':
 try:
 batch.update_job_queue(
 jobQueue=job_queue,
 state='DISABLED'
 )
 except ClientError as e:
 print(e.response['Error']['Message'])
 raise

 terminate_jobs(job_queue)

 # Wait until job queue is DISABLED
 response = describe_job_queues(job_queues)

 while response['jobQueues'][0]['state'] != 'DISABLED':
 time.sleep(5)
 response = describe_job_queues(job_queues)
 
 time.sleep(10)
 if response['jobQueues'][0]['status'] != 'DELETING':
 try:
 batch.delete_job_queue(
 jobQueue=job_queue,
 )
 except ClientError as e:
 print(e.response['Error']['Message'])
 raise

 response = describe_job_queues(job_queues)
 
 while response['jobQueues'][0]['status'] == 'DELETING':
 time.sleep(5)
 response = describe_job_queues(job_queues)

 if len(response['jobQueues']) != 1:
 break


def describe_job_queues(job_queues):
 try:
 response = batch.describe_job_queues(
 jobQueues=job_queues
 )
 except ClientError as e:
 print(e.response['Error']['Message'])
 raise

 return response


def terminate_jobs(job_queue):
 response = list_jobs(job_queue)
 for job in response['jobSummaryList']:
 batch.terminate_job(
 jobId =job['jobId'],
 reason='Removing Batch Environment'
 )
 while response.get('nextToken', None) is not None:
 response = list_jobs(job_queue, response['nextToken'])
 for job in response['jobSummaryList']:
 batch.terminate_job(
 jobId =job['jobId'],
 reason='Removing Batch Environment'
 )


def list_jobs(job_queue, next_token=""):
 try:
 if next_token:
 response = batch.list_jobs(
 jobQueue=job_queue,
 nextToken=next_token
 )
 else:
 response = batch.list_jobs(
 jobQueue=job_queue,
 )
 except ClientError as e:
 print(e.response['Error']['Message'])
 raise

 return response

In [None]:
print(job_def_name)
response = batch.deregister_job_definition(jobDefinition=job_def_name+':1')

In [None]:
resp = delete_job_queue(fastq_queue)

In [None]:
resp = delete_compute_environment(default_ce)

In [None]:
response = ec2_client.delete_security_group(GroupId=batch_sec_group_id)

In [None]:
response = iam.remove_role_from_instance_profile(
 InstanceProfileName=instance_profile_name,
 RoleName=batch_instance_role
)

In [None]:
response = iam.delete_instance_profile(
 InstanceProfileName=instance_profile_name
)

In [None]:
response = cfn.delete_stack(StackName=iam_stack_name)

In [None]:
waiter = cfn.get_waiter('stack_delete_complete')
waiter.wait(
 StackName=iam_stack_name
)

print('The wait is over for {0}'.format(iam_stack_name))

In [None]:
response = ecr.delete_repository(
 registryId=account_id,
 repositoryName=repo,
 force=True
)

In [None]:
!aws s3 rb s3://$bucket --force 

In [None]:
if not use_existing:
 workshop.vpc_cleanup(vpc_id)