# Bioinformatics tools examples After [deploying the CDK genomics pipeline project](GITHUB URL) you could test the genomics tools directly with AWS Batch or start a Step Functions pipeline. ### Testing bioinformatics tools using AWS Batch Create a file named batch-TOOL_NANE.json. ``` { "jobName": "", "jobQueue": "", "jobDefinition": "", "containerOverrides": { "vcpus": 1, "memory": 1000, "command": [""], "environment": [{ "name": "JOB_INPUTS", "value": "" }, { "name": "JOB_OUTPUTS", "value": "" }, { "name": "JOB_OUTPUT_PREFIX", "value": "" } ] } } ``` **jobName** (string) The name of the job. The first character must be alphanumeric, and up to 128 letters (uppercase and lowercase), numbers, hyphens, and underscores are allowed. **jobQueue** (string) The [job queue](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) into which the job is submitted. You can specify either the name or the Amazon Resource Name (ARN) of the queue. **jobDefinition** (string) The [job definition](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) used by this job. This value can be one of name , name:revision , or the Amazon Resource Name (ARN) for the job definition. If name is specified without a revision then the latest active revision is used. **containerOverrides.vcpus** (integer optional) The number of vCPUs to reserve for the container. This value overrides the value set in the job definition. **containerOverrides.memory** (integer optional) The number of MiB of memory reserved for the job. This value overrides the value set in the job definition. **containerOverrides.command** (list) The command to send to the container that overrides the default command from the Docker image or the job definition. **containerOverrides.environment** (list) The environment variables to send to the container. You can add new environment variables, which are added to the container at launch, or you can override the existing environment variables from the Docker image or the job definition. (structure) A key-value pair object. **name** (string) The name of the key-value pair. For environment variables, this is the name of the environment variable. **value** (string) The value of the key-value pair. For environment variables, this is the value of the environment variable. Example for a `batch-fastqc.json` ``` { "jobName": "fastqc", "jobQueue": "genomics-default-queue", "jobDefinition": "genomics-fastqc:1", "containerOverrides": { "vcpus": 1, "memory": 1000, "command": ["fastqc *.gz"], "environment": [{ "name": "JOB_INPUTS", "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R*.fastq.gz" }, { "name": "JOB_OUTPUTS", "value": "*.html *.zip" }, { "name": "JOB_OUTPUT_PREFIX", "value": "s3://my-genomics-bucket-name/some-folder-name" } ] } } ``` In this example we are running the FastQC tools that will take fastq files and generate a report. It will output zip and html files which we will save to an S3 bucket. **jobName** - "fastqc". A name that describe the job to be run. **jobQueue** - "genomics-default-queue". A valid name of a job queue. This could be found in the AWS web console > Batch > Job queues. **jobDefinition** - "genomics-fastqc:1". A valid and active job definition and it's version. This could be found in the AWS web console > Batch > Job definitions. **containerOverrides.vcpus** - 1. Request a machine that has at least 1 core. **containerOverrides.memory** - 1000. Request a machine that has at least 1000MiB of RAM. **containerOverrides.command** - ["fastqc *.gz"]. Run the fastq command on all the .gz files in the working directory. **containerOverrides.environment** - A list of key-value pairs. **name**: JOB_INPUTS. **value**: fastq files from a source S3 bucket **name**: JOB_OUTPUTS. **value**: "*.html *.zip". Copy all html and zip files from a local directory to an S3 bucket. **name**: JOB_OUTPUT_PREFIX. **value**: An S3 bucket and a prefix (folder) to copy the output files into. There are several examples under the `examples` directory. To run an example, edit the example file you want to run (e.g., `examples/batch-fastqc-job.json`), update the `JOB_INPUTS` to a valid source of your sample fastq files, or leave the default value to use a demo sample. Update the `JOB_OUTPUT_PREFIX` to a valid s3 bucket and a subfolder where you want the output zip and html files to be saved to. Change directory to the examples directory and then submit the job to Batch. ``` cd examples aws batch submit-job --cli-input-json file://batch-fastqc-job.json ``` Navigate to the Batch jobs page (AWS console -> AWS Batch -> Jobs -> select the job queue you used (e.g., `genomics-default-queue`) to track the progress of the job. You can click on the job name and them click on the Log stream name link to track the stdout on the running task.