--- title: "Contexts" date: 2021-08-31T17:26:49-04:00 draft: false weight: 20 description: > Contexts are the set of cloud resources used to run a workflow --- ## What is a Context? A context is a set of cloud resources. Amazon Genomics CLI runs [workflows]( {{< relref "../workflows" >}} ) in a context. A deployed context will include an [engine]( {{< relref "../engines" >}}) that can interpret and manage the running of a workflow along with compute resources that will run the individual tasks of the workflow. The deployed context will also contain any resources needed by the engine or compute resources including any security, permissions and [logging]( {{< relref "../logs" >}} ) capabilities. Deployed contexts are [namespaced]( {{< relref "../namespaces" >}}) based on the user, project and context name so that resources are isolated, preventing collisions. When a workflow is run the user will decide which context will run it. For example, you might choose to submit a workflow to a context that uses "Spot priced" resources or one that uses "On Demand" priced resources. When deployed context resources that require a VPC will be deployed into the VPC that was specified when the [account]( {{< relref "../accounts" >}} ) was activated. ## How is a Context Defined? A context is defined in the YAML file that defines the [project]( {{< relref "../projects" >}} ). A project has at least one context but may have many. Contexts must have unique names and are defined as YAML maps. A context may request use of [Spot priced](https://aws.amazon.com/ec2/spot/pricing/) compute resources with `requestSpotInstances: true`. The default value is `false`. A context must define an array of one or more `engines`. Each engine definition must specify the workflow language that it will interpret. For each language Amazon Genomics CLI has a default engine however, users may specify the exact engine in the `engine` parameter. ## General Architecture of a Context The exact architecture of a context will depend on the context properties described below and defined in their `agc-project.yaml`. However, the architecture deployed on execution of `agc context deploy` is shown in the following diagram: ![Image of the general architecture of a context](ContextGeneralArchitecture.png "General Architecture of a Context") ## Context Properties ### Instance Types You may optionally specify the instance types to be used in a context. This can be a specific type such as `r5.2xlarge` or it can be an instance family such as `c5` or a combination. By default, a context will use instance types up to `4xlarge` > Note, if you only specify large instance types you will be using those instances for running even the smallest tasks so we recommend including smaller types as well. Ensure that any custom types you list are available in the region that you're using with Amazon Genomics CLI or the context will fail to deploy. You can obtain a list using the following command ```shell aws ec2 describe-instance-type-offerings \ --region ``` #### Examples The following snippet defines two contexts, one that uses spot resources and one that uses on demand. Both contain a WDL engine. ```yaml ... contexts: # The on demand context uses on demand EC2 instances which may be more expensive but will not be interrupted onDemandCtx: requestSpotInstances: false engines: - type: wdl engine: cromwell ``` ```yaml # The spot context uses EC2 spot instances which are usually cheaper but may be interrupted spotCtx: requestSpotInstances: true engines: - type: wdl engine: cromwell ... ``` The following context may use any instance type from the `m5`, `c5` or `r5` families ```yaml contexts: nfLargeCtx: instanceTypes: [ "c5", "m5", "r5" ] engines: - type: nextflow engine: nextflow ``` ### Max vCpus *default:* 256 You may optionally specify the maximum number of vCpus used in a context. This is the max total amount of vCpus of all the jobs currently running within a context. When the max has been reached, additional jobs will be queued. *note:* if your vCPU limit is lower than maxVCpus then you won't get as many as requested and would need to make a limit increase. ```yaml contexts: largeCtx: maxVCpus: 2000 engines: - type: nextflow engine: nextflow ``` ### Public Subnets In the interest of saving money, in particular if you intend to have the AGC stack deployed for a long period, you may choose to deploy in "public subnet" mode. To do this, you must first set up the core stack using `aws configure --usePublicSubnets`, which will disable the creation of the NAT gateway and VPC endpoints which present an ongoing cost unrelated to your use of compute resources. After you have done this, you must also set `usePublicSubnets: true` in all contexts you use: ```yaml contexts: someCtx: usePublicSubnets: true engines: - type: nextflow engine: nextflow ``` This ensures that the AWS batch instances are deployed into a public subnet, which has no additional cost associated with it. However note that while these instances are given a security group that will block all incoming traffic, this is not as secure as using the default private subnet mode. ## Context Commands A full reference of context commands is [here]( {{< relref "../../Reference/agc_context" >}} ) ### `describe` The command `agc context describe [flags]` will describe the named context as defined in the project YAML as well as other relevant account information. ### `list` The command `agc context list [flags]` will list the names of all contexts defined in the project YAML file along with the name of the engine used by the context. ### `deploy` The command `agc context deploy [flags]` is used to deploy the cloud infrastructure required by the context. If the context is already running the existing infrastructure will be updated to reflect changes in project YAML. For example if you added another `data` definition in your project and run `agc context deploy ` then the deployed context will be updated to allow access to the new data. All contexts defined in the project YAML can be deployed or updated using the `--all` flag. Individually named contexts can be deployed or updated as positional arguments. For example: `agc context deploy ctx1 ctx2` will deploy the contexts `ctx1` and `ctx2`. The inclusion of the `--verbose` flag will show the full CloudFormation output of the context deployment. ### `destroy` A contexts cloud resources can be "destroyed" using the `agc context destroy ` command. This will remove any infrastructure artifacts associated with the context unless they are defined as being retained. Typically, things like logs and workflow outputs on S3 are retained when a context is destroyed. All deployed contexts can be destroyed using the `--all` flag. Multiple contexts can be destroyed in a single command using positional arguments. For example: `agc context destroy ctx1 ctx2` will destroy the contexts `ctx1` and `ctx2`. ### `status` The status command is used to determine the status of a *deployed* context or context instance. This can be useful to determine if an instance of a particular context is already deployed. It can be used to determine if the deployed context is consistent with the defined context in the project YAML file. For example, if you deploy a context instance and later change the definition of the context in the project YAML file then the running instance will no longer reflect the definition. In this case you may choose to update the deployed instance using the `agc context deploy` command. Status will only be shown for contexts for the current user in the current AWS region for the current project. To show contexts for another project, issue the command from that project's home folder (or subfolder). To display contexts for another AWS region, you can use a different AWS CLI profile or set the `AWS_PROFILE` environment variable to the desired region (e.g `export AWS_REGION=us-west-2`). {{% alert title="Warning" color="warning" %}} Because the `status` command will only show contexts that are listed in the project YAML you should take care to `destroy` any running contexts before deleting them from the project YAML. {{% /alert %}} ## Costs Infrastructure deployed for a context is tagged with the context name as well as username and project name. These tags can be used with AWS CostExplorer to identify the costs associated with running contexts. A deployed context will incur charges based on the resources being used by the context. If a workflow is running this will include compute costs for running the workflow tasks but some contexts may include infrastructure that is always "on" and will incur costs even when no workflow is running. If you no longer need a context we recommend pausing or destroying it. If `requestSpotInstances` is true, the context will use spot instances for compute tasks. The context will set the max price to 100% although if the current price is lower you will pay the lower price. Note that even at 100% spot instances can still be interrupted if total demand for on demand instances in an availability zone exceeds the available pool. For full details see [Spot Instance Interruptions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html) and [EC2 Spot Pricing](https://aws.amazon.com/ec2/spot/pricing/). ### Ongoing Costs Until a context is destroyed resources that are deployed can incur ongoing costs even if a workflow is not running. The exact costs depend on the configuration of the context. Amazon Genomics CLI version 1.0.1 and earlier used an AWS Fargate based WES service for each deployed context. The service uses 0.5 vCPU, 4 GB memory and 20 GB base instance storage. Fargate pricing varies by region and is detailed [here](https://aws.amazon.com/fargate/pricing/). The estimated cost is available via [this link](https://calculator.aws/#/estimate?id=9a67ba7845199cf108d85ae0f9b8176253266005) After version 1.0.1, the WES endpoints deployed by Amazon Genomics CLI are implemented with AWS Lambda and therefore use a [pricing model](https://aws.amazon.com/lambda/pricing/) based on invocations. Contexts using a Cromwell engine run an additional AWS Fargate service for the engine with 2 vCPU, 16 GB RAM and 20 GB of base storage. Additionally, Cromwell is deployed with a standard EFS volume for storage of metadata. EFS [costs](https://aws.amazon.com/efs/pricing/) are volume based. While relatively small the amount of metadata will expand as more workflows are run. The volume is destroyed when the context is destroyed. An estimated cost for both components is available via [this link](https://calculator.aws/#/estimate?id=8ccc606c1b267e2933a6d683c0b98fcf11e4cbab) Contexts using the "miniwdl" or "snakemake" engines use EFS volumes as scratch space for workflow intermediates, caches and temporary files. Because many genomics workflows can accumulate several GB of intermediates per run we recommend destroying these contexts when not in use. An estimated cost assuming a total of 500 GB of workflow artifacts is available via [this link](https://calculator.aws/#/estimate?id=4d19b43aa86fcc3af199c425bfcc55193592cbb4) Refer to the [public subnets section](#public-subnets) if you are concerned about reducing these ongoing costs. ### Tags All context infrastructure is [tagged]( {{< relref "namespaces#tags" >}} ) with the context name, username and project name. These tags may be used to help differentiate costs. ## Technical Details Context infrastructure is defined as code as [AWS CDK](https://aws.amazon.com/cdk/) apps. For examples, take a look at the `packages/cdk` folder. When deployed a context will produce one or more stacks in Cloudformation. Details can be viewed in the Cloudformation console or with the AWS CLI. A context includes an endpoint compliant with the [GA4GH WES API](https://ga4gh.github.io/workflow-execution-service-schemas/docs/). This API is how Amazon Genomics CLI submits workflows to the context. The context also contains one or more workflow engines. These may either be deployed as long-running services as is the case with Cromwell or as "head" jobs that are responsible for a single workflow, as is the case for NextFlow. Engines run as "head" jobs are started and stopped on demand thereby saving resources. ### Updating Launch Templates Changes to EC2 LaunchTemplates in CDK result in a new LaunchTemplate version when the infrastructure is updated. Currently, CDK is unable to also update the default version of the template. In addition, any existing AWS Batch Compute Environments will not be updated to use the new LaunchTemplate version. Because of this, whenever a LaunchTemplate is updated in CDK code we recommend destroying any relevant running contexts and redeploying them. An update will *NOT* be sufficient.