ParallelCluster post-install samples

This repository gather some ParallelCluster post-install samples for common HPC-related operations.
Primary ParallelCluster script sets the environment and launches secondary scripts according to their naming convention.
All those scripts are meant to be stored on an S3 bucket. See more details in the Requirements section below. At the moment we are including: 1. *01.install.enginframe.master.sh*
Secondary script installing NICE EnginFrame HPC portal 2. *02.install.dcv.broker.master.sh*
Secondary script installing DCV Session Manager Broker

Software and Services used

AWS ParallelCluster is an open source cluster management tool that simplifies deploying and managing HPC clusters with Amazon FSx for Lustre, EFA, a variety of job schedulers, and the MPI library of your choice. AWS ParallelCluster simplifies cluster orchestration on AWS so that HPC environments become easy-to-use even for if you’re new to the cloud. 

NICE EnginFrame is the leading grid-enabled application portal for user-friendly submission,control and monitoring of HPC jobs and interactive remote sessions.It includes sophisticated data management for all stages of HPC job lifetime and is integrated with most popular job schedulers and middleware tools to submit, monitor, and manage jobs.

NICE DCV  is a remote visualization technology that enables users to securely connect to graphic-intensive 3D applications hosted on a remote, high-performance server. With NICE DCV, you can make a server's high-performance graphics processing capabilities available to multiple remote users by creating secure client sessions. 

NICE DCV Session Manager is set of two software packages (an Agent and a Broker) and an application programming interface (API) that makes it easy for developers and independent software vendors (ISVs) to build front-end applications that programmatically create and manage the lifecycle of NICE DCV sessions across a fleet of NICE DCV servers. 

Overview

I’ll add the following 2 options to my ParallelCluster configuration file:
post_install = s3://<bucket>/<bucket key>/scripts/post.install.sh
post_install_args = '<bucket> <bucket key> <efadmin password (optional)>'
The first one, post_install, specifies a Bash script stored on Amazon S3 as ParallelCluster post-install option. This is my main script that will run secondary scripts for EnginFrame and DCV Session Manager broker respectively.

The second parameter, post_install_args, passes a set of arguments to the above script:
Secondary script will get those arguments, detect all the other information required and proceed with the installation of the 2 components on ParallelCluster master host.

EnginFrame and DCV Session Manager Broker secondary scripts are separated, so you can potentially install just one of them.

Note: This procedure has been tested with EnginFrame version 2020.0 and DCV Session Manager Broker version 2020.2. With easy modifications, though, it can work with previous versions, just mind to add the license management.

Walktrough

Requirements

To perform a successful installation of EnginFrame and DCV Sesssion Manager broker, you’ll need:
Note: neither EnginFrame 2020 or DCV Session Manager Broker need a license if running on EC2 instances. For more details please refer to their documentation.

Step 1. Review and customize post-install scripts

GitHub code repository for this article contains 3 main scripts:
Secondary scripts follow this naming convention: they start with a number that will set their execution order, then they describe their purpose, and finally define the node type in which they should be executed (master or compute) as a last argument, just before the extension, e.g.:
01.install.enginframe.master.sh
|  |                  |      |    
|  |                  |      file extension
|  purpose            |
|                     to be run on master or compute nodes
execution order
While main post-install script post.install.sh just sets environment variables and launches secondary scripts, you might want to check the secondary ones: 01.install.enginframe.master.sh installing EnginFrame and 02.install.dcv.broker.master.sh installing DCV Session Manager Broker.

Crucial parameters are set in ParallelCluster configuration file, and some EnginFrame settings are defined into efinstall.config file. All these files should be checked to reflect what you have in mind.

You can also add further custom scripts, in the same folder, following the naming convention stated above. An example could be installing an HPC application locally on a compute node, or in the master shared folder.

Each script sources /etc/parallelcluster/cfnconfig to get the required information about current cluster settings, AWS resources involved and node type. Specifically, cfnconfig defines 
Note: More details on each scripts are provided in Post-install scripts details section following the Walktrough.

Step 2. Prepare your S3 bucket 

I create an S3 Bucket e.g. mys3bucket, with the following structure and contents in a prefix of choice (Packages names and version numbers may vary):
packages
├── NICE-GPG-KEY.conf
├── efinstall.config
├── enginframe-2020.0-r58.jar
└── nice-dcv-session-manager-broker-2020.2.78-1.el7.noarch.rpm
scripts
├── 01.install.enginframe.master.sh
├── 02.install.dcv.broker.master.sh
└── post.install.sh

Step 3. Modify or create your ParallelCluster configuration file

As mentioned, the only settings required by my scripts are the following in the [cluster] section:  post_install, post_install_args and s3_read_resource:
post_install = s3://<bucket>/<bucket key>/scripts/post.install.sh
post_install_args = '<bucket> <bucket key> <efadmin password (optional)>'
s3_read_resource = arn:aws:s3:::<bucket>/<bucket key>/*
The post.install.sh main script is set as the post_install option value, with its S3 full path, and provided arguments:
a) bucket name 
b) bucket folder/key location
c) efadmin user (primary EnginFrame administrator) password
all separated by space. All post install arguments must be enclosed in a single pair of single quotes, as in the example code.
Finally, the s3_read_resource option grants the master access to the same S3 location to download secondary scripts: first one installing EnginFrame (01.install.enginframe.master.sh) and second one installing DCV Session Manager broker (02.install.dcv.broker.master.sh).

Note: you may wish to associate a custom role to the ParallelCluster master instead of using the s3_read_resource option.

Note: ParallelCluster documentation suggests to use double quotes for post_install_args. This is not working with the last version of parallelcluster available when writing this article, so I’m using single quotes. This is under fixing and will probably change in near future.

A configuration file sample is provided under the parallelcuster folder of the github repository.

Step 4. Create ParallelCluster

You can now start ParallelCluster creation with your preferred invocation command, e.g.:
pcluster create --norollback --config parallelcluster/config.sample PC291
Hint: when testing it’s probably better to disable rollback like in the above command line: this will allow you to connect via ssh to the Master instance to diagnose problems if something with the post-install scripts went wrong.

Cleaning up

To avoid incurring future charges, delete idle ParallelCluster instances via its delete command:
pcluster delete --config parallelcluster/config.sample PC291

Post-install scripts details

In this section I’ll list some more details on the scripts logic. This could be a starting point in customizing, evolving or adding more secondary scripts to the solution. For example, you might want to add a further script to automatically install an HPC application into ParallelCluster master node.

Main post.install.sh

Post-install script post.install.sh goes through the following steps:

EnginFrame

Provided script 01.install.enginframe.sh performs the following steps:

DCV Session Manager Broker

Provided script 02.install.dcv.broker.master.sh performs the following steps:
Optionally, if EnginFrame is installed, it:

Troubleshooting

Detailed output log is available on the master node, in:
You can reach it via ssh, after getting the master node IP address from AWS Console → EC2 → Instances and looking for an instance named Master.
## Security See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. ## License This library is licensed under the MIT-0 License. See the LICENSE file.