# AWS GATK Stack Here we present a nested AWS CDK stack for deploying the necessary infrastructure to run [GATK](https://gatk.broadinstitute.org/hc/en-us) workflows on AWS Batch. These stacks provision the following components: ## Development The following are required for developing this stack * AWS CDK * AWS CLIv2 * Node and npm (v12.x) * Python 3.7 * Poetry Requirements are handled with [poetry](https://python-poetry.org) and [pre-commit](https://pre-commit.com/). After cloning this repo, use `poetry install` to set up the virtualenv, then `poetry shell` to acivate it. With the environment active, you can then `pre-commit install` to install the git hooks. To install the CDK components, run `npm install` in the directory containing the `package.json` file. ## Deployment ### IAM User Deployment should be run using an IAM account with sufficient permission to perform the actions of this stack. To get up and running quickly this account will need rights to perform * Creation of AWS batch components (CE, Queue, etc) * Creation of IAM roles * VPC components (e.g., ) ### Environment variables The following environment variables are required for deployment. More info on these variables can be found at https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html and https://docs.aws.amazon.com/cdk/latest/guide/environments.html. * `AWS_PROFILE` * `AWS_DEFAULT_REGION` The app will use these variables to derive CDK specific environment values needed for deployment. ### Bootstrap The first time you deploy a CDK stack to your AWS account, you need to bootstrap it. This can be done using `npx cdk bootstrap` ### Configuration The stack can take advantage of pre-existing VPCs or buckets if desired - configuration of these resources can be found in `props.json` * VPC * To use an existing VPC `setvpc_exists` to `true` and provide the VPC name * Otherwise set `vpc_exists` to false * S3 Buckets: **Bucket names must be provided** *This stack creates 3 buckets for use with the batch runs: * `work_bucket`: nextflow work bucket * `data_bucket`: for storage of data to be used with nextflow GATK workflows * `ref_bucket`: for storing common reference files * To use existing buckets simply set `exists` to `true` for the bucket and provide the name and ARN * To create new buckets set `exists` to `false` and provide a name for the bucket. This name must be a unique name not used by any other S3 buckets. ### Deploy It's always best to test the synthesis of the cloudformation before deployment, and you can do that with `npx cdk synth`. After you're satisfied with the changes, deployment can be executed using `npx cdk deploy`.