High performance cluster running Windows on the AWS Cloud
Quick Start Reference Deployment

August 2021
Sudhir Amin, Microsoft Workloads and Dave May, AWS Integration and Automation team
Visit our GitHub repository for source files and to post feedback, report bugs, or submit feature ideas for this Quick Start. |
This Quick Start was created by AWS in collaboration with Amazon Web Services (AWS). Quick Starts are automated reference deployments that use AWS CloudFormation templates to deploy key technologies on AWS, following AWS best practices.
Overview
This guide provides instructions for deploying Microsoft HPC Pack 2019 software on Amazon EC2 Windows instances in the AWS Cloud.
With HPC Pack, you can create and manage HPC clusters consisting of Amazon EC2 Windows Instances and take advantage of base infrastructure leveraging cloud native AWS services such as Amazon FSx for Windows server for high performance filesystem, AWS Managed Microsoft AD for identity solution and other Amazon EC2 features such AWS autoscaling, Placement Groups. This QuickStart can help you to quickly evaluate architecture components required for a successful proof of concepts and understand recommended architecture pattern and best practices.
Additionally, you will also learn how to effectively manage HPC Pack Cluster capacity using AWS Autoscaling, Amazon CloudWatch and AWS System Manager.
Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start. |
High performance cluster running Windows on AWS
This Quick Start deploys an HPC Pack cluster with one single head node running Microsoft HPC Pack 2019 in Active Directory forest. The base infrastructure includes highly available Active Directory and Windows based network filesystem fully managed, highly reliable, and scalable file storage that is accessible over the industry-standard Server Message Block (SMB) protocol. This Quick Start supports Microsoft HPC Pack 2019 on Windows Server 2019 with local datastore using SQL Server Standard edition.
The architecture use key AWS Services and features
-
Amazon Machine Image for Head Node - Windows Server 2019 and SQL Server Standard Edition 2019 AMI provided by AWS.
-
Amazon Machine Image for Compute Node - Windows Server 2019 provided by AWS.
-
Amazon FSx for Windows File Server.
-
AWS Managed Microsoft AD.
-
Amazon S3 - Cloud Object Storage.
The automation in this deployment uses AWS Systems Manager Automation, AWS CloudFormation, and Windows PowerShell to deploy the architecture.
AWS costs
You are responsible for the cost of the AWS services and any third-party licenses used while running this Quick Start. There is no additional cost for using the Quick Start.
The AWS CloudFormation templates for Quick Starts include configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.
After you deploy the Quick Start, create AWS Cost and Usage Reports to deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. These reports provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, see What are AWS Cost and Usage Reports? |
Software licenses
No licenses are required to deploy this Quick Start. The EC2 instances deployed by this Quick Start are license-included.
Architecture
Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following Microsoft HPC cluster environment in the AWS Cloud.

As shown in Figure 1, the Quick Start sets up the following:
-
A highly available architecture that spans two Availability Zones.
-
A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.
-
In the public subnets:
-
Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.
-
A Windows jump box in a public subnet for access to resources in the private subnets.
-
-
In the private subnets:
-
An HPC head node to initiate HPC workloads.
-
An auto scaling group of HPC worker nodes.
-
-
An AWS Managed Directory Service directory.
-
A managed Amazon FSx file system for shared storage.
Planning the deployment
Specialized knowledge
This deployment requires a moderate level of familiarity with AWS services. If you’re new to AWS, see Getting Started Resource Center and AWS Training and Certification. These sites provide materials for learning how to design, deploy, and operate your infrastructure and applications on the AWS Cloud.
This Quick Start also assumes familiarity with HPC workloads.
HPC Cluster Basics
HPC Head Node
HPC Compute Cluster
High Performance File System
AWS account
If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad.
Your AWS account is automatically signed up for all AWS services. You are charged only for the services you use.
Technical requirements
Before you launch the Quick Start, review the following information and ensure that your account is properly configured. Otherwise, deployment might fail.
Resource quotas
If necessary, request service quota increases for the following resources. You might need to request increases if your existing deployment currently uses these resources and if this Quick Start deployment could result in exceeding the default quotas. The Service Quotas console displays your usage and quotas for some aspects of some services. For more information, see What is Service Quotas? and AWS service quotas.
Resource | This deployment uses |
---|---|
VPCs |
1 |
Elastic IP addresses |
1 |
Security groups |
4 |
AWS Identity and Access Management (IAM) roles |
3 |
Auto Scaling groups |
1 |
EC2 instances |
1 |
AWS Managed Directories |
1 |
Amazon FSx shares |
1 |
Supported AWS Regions
For any Quick Start to work in a Region other than its default Region, all the services it deploys must be supported in that Region. You can launch a Quick Start in any Region and see if it works. If you get an error such as “Unrecognized resource type,” the Quick Start is not supported in that Region.
For an up-to-date list of AWS Regions and the AWS services they support, see AWS Regional Services.
Certain Regions are available on an opt-in basis. For more information, see Managing AWS Regions. |
IAM permissions
Before launching the Quick Start, you must sign in to the AWS Management Console with IAM permissions for the resources that the templates deploy. The AdministratorAccess managed policy within IAM provides sufficient permissions, although your organization may choose to use a custom policy with more restrictions. For more information, see AWS managed policies for job functions.
Deployment options
This Quick Start provides two deployment options:
-
Deploy Microsoft HPC cluster into a new VPC. This option builds a new AWS environment consisting of the VPC, subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components. It then deploys Microsoft HPC cluster into this new VPC.
The Quick Start lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Microsoft HPC cluster settings, as discussed later in this guide.
Deployment steps
Prerequisites
-
Prepare a PFX certificate
For operating system Windows 10 or Windows Server 2016 above.
New-SelfSignedCertificate -Subject "CN=HPC Pack 2019 Communication" -KeySpec KeyExchange -KeyLength 2048 -TextExtension @("2.5.29.37={text}1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2") -CertStoreLocation cert:\CurrentUser\My -KeyExportPolicy Exportable -HashAlgorithm SHA256 -Provider "Microsoft Enhanced RSA and AES Cryptographic Provider" -NotAfter (Get-Date).AddYears(5) -NotBefore (Get-Date).AddDays(-1)
-
Upload the certificate to Amazon S3 Bucket.
Before deploying this QuickStart, upload the PFX certificate to Amazon S3 bucket.
The QuickStart will use parameters Certificate S3 bucket & Certificate S3 key to locate the certificate.
Confirm your AWS account configuration
-
Sign in to your AWS account at https://aws.amazon.com with an IAM user role that has the necessary permissions. For details, see Planning the deployment earlier in this guide.
-
Make sure that your AWS account is configured correctly, as discussed in the Technical requirements section.
Launch the Quick Start
If you’re deploying Microsoft HPC cluster into an existing VPC, make sure that your VPC has two private subnets in different Availability Zones for the workload instances and that the subnets aren’t shared. This Quick Start doesn’t support shared subnets. These subnets require NAT gateways in their route tables to allow the instances to download packages and software without exposing them to the internet. Also make sure that the domain name option in the DHCP options is configured as explained in DHCP options sets. You provide your VPC settings when you launch the Quick Start. |
Each deployment takes about 15 minutes to complete.
-
Sign in to your AWS account, and choose one of the following options to launch the AWS CloudFormation template. For help with choosing an option, see Deployment options earlier in this guide.
-
Check the AWS Region that’s displayed in the upper-right corner of the navigation bar, and change it if necessary. This Region is where you build the network infrastructure. The template is launched in the us-east-1 Region by default. For other choices, see Supported Regions earlier in this guide.
-
On the Create stack page, keep the default setting for the template URL, and then choose Next.
-
On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. For details on each parameter, see the Parameter reference section of this guide. When you finish reviewing and customizing the parameters, choose Next.
-
On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.
-
On the Review page, review and confirm the template settings. Under Capabilities, select the two check boxes to acknowledge that the template creates IAM resources and might require the ability to automatically expand macros.
-
Choose Create stack to deploy the stack.
-
Monitor the status of the stack. When the status is CREATE_COMPLETE, the Microsoft HPC cluster deployment is ready.
-
To view the created resources, see the values displayed in the Outputs tab for the stack.
Post-deployment steps
Getting Started
Once the deployment finishes, review the CloudFormation output
< We need put screenshots here from output >
Login to Head Node
The Quickstart will deploy a Jumpbox EC2 Instance with HPC Management client libraries installed. You can use this windows instance to perform all cluster administration and job management tasks remotely.
< We need to put screenshots to show how to run HPC Node and jobs commands >
Scale your Cluster
Manually scale your cluster by updating the autoscaling configuration
< We need put screenshots here either command to update ASG numbers or manually hunt for ASG and update >
Best practices for using Microsoft HPC cluster on AWS
Selecting Compute
For Head Node,
For Compute Node,
Selecting network
Using Single AZ and private subnet
Security Groups
Selecting storage
For high performance parallel FileSystem,
For object storage,
For Data Movement, leverage Data SYNC
Placement Groups
Take advantage of Cluster placement groups for tightly coupled HPC Workload. HPC applications that exchanges a large amount of data between nodes can use Placement group to provide low latency and high throughput.
The quickStart uses Cluster Placement group strategy within the Amazon EC2 Launch template to group all the worker nodes into single Availability zone.
-
Why placement group ?
To learn more, visit the documentation below.
A cluster placement group is a logical grouping of instances within a single Availability Zone. A cluster placement group can span peered VPCs in the same Region. Instances in the same cluster placement group enjoy a higher per-flow throughput limit for TCP/IP traffic and are placed in the same high-bisection bandwidth segment of the network.
Optimize Compute Node AMI
The QuickStart is configured to pull latest Windows Server AMI provided by AWS. When scale your cluster using autoscaling group,the user data script within launch template installs the Microsoft HPC Pack software and CloudWatch Agent during the boot. This can significantly impact time to prepare the fleet of compute workers and introduce delay in submitting the job.
To further optimize, you could use Amazon EC2 Image Builder to fully baked the Compute Node AMI and update the launch template with AMI ID of custom image.
To learn more about EC2 Image Builder, visit the documentation
Autoscaling Best practices
Autoscaling should be considered based on your requirement. There are scenarios where you can take advantage of autoscaling feature to drive efficiency. Below are some example scenarios to explain how autoscaling can be used to drive efficiency.
-
Scaling based on fixed capacity.
-
Scaling based on queued tasks.

Security
-
Certificates
-
Security Groups for worker, head, rdp etc
-
Domain join vs non domain join worker
Other useful information
FAQ
Q. I encountered a CREATE_FAILED error when I launched the Quick Start.
A. If AWS CloudFormation fails to create the stack, relaunch the template with Rollback on failure set to Disabled. This setting is under Advanced in the AWS CloudFormation console on the Configure stack options page. With this setting, the stack’s state is retained, and the instance keeps running so that you can troubleshoot the issue. (For Windows, look at the log files in %ProgramFiles%\Amazon\EC2ConfigService
and C:\cfn\log
.)
When you set Rollback on failure to Disabled, you continue to incur AWS charges for this stack. Delete the stack when you finish troubleshooting. |
For more information, see Troubleshooting AWS CloudFormation.
Q. I encountered a size-limitation error when I deployed the AWS CloudFormation templates.
A. Launch the Quick Start templates from the links in this guide or from another S3 bucket. If you deploy the templates from a local copy on your computer or from a location other than an S3 bucket, you might encounter template-size limitations. For more information, see AWS CloudFormation quotas.
Troubleshooting
<Steps for troubleshooting the deployment.>
Customer responsibility
After you successfully deploy this Quick Start, confirm that your resources and services are updated and configured — including any required patches — to meet your security and other needs. For more information, see the AWS Shared Responsibility Model.
Parameter reference
Unless you are customizing the Quick Start templates for your own deployment projects, keep the default settings for the parameters labeled Quick Start S3 bucket name, Quick Start S3 bucket Region, and Quick Start S3 key prefix. Changing these parameter settings automatically updates code references to point to a new Quick Start location. For more information, see the AWS Quick Start Contributor’s Guide. |
Send us feedback
To post feedback, submit feature ideas, or report bugs, use the Issues section of the GitHub repository for this Quick Start. To submit code, see the Quick Start Contributor’s Guide.
Quick Start reference deployments
See the AWS Quick Start home page.
GitHub repository
Visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.
Notices
This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.
The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for specific language governing permissions and limitations.