## Slurm Federation on AWS ParallelCluster 

Built upon what you learning in pcluster-athena++ and pcluster-athena++short notebooks, we will explore how to use Slurm federation on AWS ParallelCluster. 

Many research institutions have existing on-prem HPC clusters with Slurm scheduler. Those HPC clusters have a fixed size and sometimes require additional capacity to run workloads. "Bursting into cloud" is a way to handle that requests. 

In this notebook, we will
1. Build two AWS ParallelClusters - "awscluster" (as a worker cluster) and "onpremcluster" (to simulate an on-prem cluster)
1. *Enable REST on "onpremcluster"
1. *Enable Slurm accouting with mySQL as data store on "onpremcluster"
1. *Enable Slurmdbd on "awscluster" to point to the slurm accounting endpoint on "onpremcluster"
1. Create a federation with "awscluster" and "onpremcluster" clusters. 
1. Submit a job from "onpremcluster" to "awscluster"
1. Submit a job from "awscluster" to "onpremcluster"
1. Check job/queue status on both clusters

Most of the steps (with an *) listed above are executed automatically in the post install script (scripts/pcluster_post_install_*.sh). 

<img src="images/SlurmFederation.png">

Here is an illustration of Slurm Federation. In this workshop, we will an AWS ParallelCluster "onpremcluster" and an RDS database to simulate the on-prem datacenter environment, and we will use another AWS ParallelCluster "awscluster" for cloud HPC. 

In [None]:
import boto3
import botocore
import json
import time
import os
import sys
import base64
import docker
import pandas as pd
import importlib
import project_path # path to helper methods
from lib import workshop
from botocore.exceptions import ClientError
from IPython.display import HTML, display

#sys.path.insert(0, '.')
import pcluster_athena
importlib.reload(pcluster_athena)


# unique name of the pcluster
onprem_pcluster_name = 'onpremcluster'
onprem_config_name = "config3-simple.yaml"
onprem_post_install_script_prefix = "scripts/pcluster_post_install_onprem.sh"
slurm_version="22.05.5"
pcluster_version="3.3.0"

# unique name of the pcluster
aws_pcluster_name = 'awscluster'
aws_config_name = "config3-simple.yaml"
aws_post_install_script_prefix = "scripts/pcluster_post_install_aws.sh"

federation_name = "burstworkshop"
REGION='us-east-1'
# 
!mkdir -p build

# install pcluster cli
!pip install --upgrade aws-parallelcluster==$pcluster_version
!pcluster version


ec2_client = boto3.client("ec2")


## Install nodejs in the current kernal

pcluster3 requires nodejs executables. We wil linstall that in the current kernal. 

SageMaker Jupyter notebook comes with multiple kernals. We use "conda_python3" in this workshop. If you need to switch to another kernal, please change the kernal in the following instructions accordingly. 

1. Open a terminal window from File/New/Ternimal - this will open a terminal with "sh" shell.
2. exetute ```bash``` command to switch to "bash" shell
3. execute ```conda activate python3```
4. execute the following commands (you can cut and paste the following lines and paste into the terminal)

```
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
chmod +x ~/.nvm/nvm.sh
~/.nvm/nvm.sh
bash
nvm install v16.3.0
node --version
```

After you installed nodejs in the current kernel, **restart the kernal** by reselecting the "conda_python3" on the top right corner of the notebook. You should see the output of the version of node, such as "v16.9.1" after running the following cell.

In [None]:
!node --version

In [None]:
# this is used during developemnt, to reload the module after a change in the module
try:
    del sys.modules['pcluster_athena']
except:
    #ignore if the module is not loaded
    print('Module not loaded, ignore')
    
from pcluster_athena import PClusterHelper


In [None]:
%%time
# create the onprem clsuter
onprem_pcluster_helper = PClusterHelper(onprem_pcluster_name, onprem_config_name, onprem_post_install_script_prefix, federation_name=federation_name, slurm_version=slurm_version)
onprem_pcluster_helper.create_before()
!pcluster create-cluster --cluster-name $onprem_pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$onprem_config_name --region $onprem_pcluster_helper.region




In [None]:
cf_client = boto3.client('cloudformation')

waiter = cf_client.get_waiter('stack_create_complete')

try:
    print("Waiting for cluster creation to complete ... ")
    waiter.wait(StackName=onprem_pcluster_name)
except botocore.exceptions.WaiterError as e:
    print(e)

print("onpremcluster creation completed. ")
onprem_pcluster_helper.create_after()

resp=cf_client.describe_stacks(StackName=onprem_pcluster_name)
outputs=resp["Stacks"][0]["Outputs"]

dbd_host=''
for o in outputs:
    if o['OutputKey'] == 'HeadNodePrivateIP':
        dbd_host = o['OutputValue']
        print("Slurm REST endpoint is on ", dbd_host)
        break;
        

In [None]:
# copy the ssh key to .ssh 
!cp -f pcluster-athena-key.pem ~/.ssh/pcluster-athena-key.pem
!chmod 400 ~/.ssh/pcluster-athena-key.pem

In [None]:

# create the awscluster - need to wait till the onprem cluster finish - need the dbd host 
aws_pcluster_helper = PClusterHelper(aws_pcluster_name, aws_config_name, aws_post_install_script_prefix, dbd_host=dbd_host, federation_name=federation_name, slurm_version=slurm_version)
aws_pcluster_helper.create_before()
!pcluster create-cluster --cluster-name $aws_pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$aws_config_name --region $aws_pcluster_helper.region


In [None]:

try:
    print("Waiting for cluster to creation to complete ... ")
    waiter.wait(StackName=aws_pcluster_name)
except botocore.exceptions.WaiterError as e:
    print(e)

print("awscluster creation completed. ")

aws_pcluster_helper.create_after()

In [None]:
# Add security group to each cluster security group - this only applies to the current configuration where 
# both clusters are in AWS. 
# For a real on-prem environment, you will need to configure your network firewall to allow traffic between the two clusters
# Each pcluster is created with a set of cloudformation templates. We can get some detailed information from the stack

cf_client = boto3.client("cloudformation")
aws_pcluster_head_sg = cf_client.describe_stack_resource(StackName=aws_pcluster_name, LogicalResourceId='HeadNodeSecurityGroup')['StackResourceDetail']['PhysicalResourceId']
onprem_pcluster_head_sg = cf_client.describe_stack_resource(StackName=onprem_pcluster_name, LogicalResourceId='HeadNodeSecurityGroup')['StackResourceDetail']['PhysicalResourceId']

print(aws_pcluster_head_sg)
print(onprem_pcluster_head_sg)

try:
    resp = ec2_client.authorize_security_group_ingress(GroupId=aws_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': onprem_pcluster_head_sg}] } ] ) 
except ClientError  as err:
    print(err , " The security groups might have established trust from previous runs. Ignore.")

try:
    resp = ec2_client.authorize_security_group_ingress(GroupId=onprem_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': aws_pcluster_head_sg}] } ] ) 
except ClientError  as err:
    print(err , " The security groups might have established trust from previous runs. Ignore.")


### Add awscluster to the federation. 

Open two seperate terminal windows and use each to ssh into "awscluster" and "onpremcluster"

Run the following command in terminal to login to the headnode of each clusters. Replace $pcluster_name with "awscluster" or "onpremcluster". 

```
pcluster ssh --cluster-name $pcluster_name -i ~/.ssh/pcluster-athena-key.pem --region us-east-1
```

Run the following commands on awscluster headnode

```
sudo systemctl restart slurmctld 

sudo /opt/slurm/bin/sacctmgr -i add federation burstworkshop clusters=awscluster,onpremcluster
```

Restarting slurmctld will add awscluster to the clusters list, which can take a few seconds. if you get an error when running the second command, wait for a few more seconds and run it again.


## Slurm Federation - job submission

### Submit Job from onpremcluster to awscluster

On the headnode of the **onpremcluster**, execute the following command. 


<div class="alert alert-info">
Since /shared/tmp is owned by "slurm" user, you will need to submit the job as slurm user 
</div>

```
sudo su slurm 
cd /shared/tmp
sbatch -M awscluster batch_test.sh
```


This will submit the job from the **onpremcluster** to **awscluster**. The batch script simply runs "hostname" command on two nodes, with 4 tasks on each node
```
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --partition=q1
#SBATCH --job-name=test

cd /shared/tmp

srun hostname
srun sleep 60
```


### View job status on awscluster

On the headnode of the awscluster, use ```sinfo``` and  ```squeue``` to check the cluster and queue status. You will see something like the following, which indicates that the "awscluster" received the job submission from "onpremcluster" and is allocating the two nodes requested in the batch script. 

<img src="images/sinfo-awscluster.png" width="600"> 

When we are running the "hostname" command on multiple nodes , it will take some time for the nodes to power up. After the job is completed, you should be able to see the list of hostnames in the slurm output file. 

<img src="images/slurm-output.png" width="500"> 



# Don't forget to clean up

1. Delete the ParallelCluster
2. Delete the RDS
3. S3 bucket
4. Secrets used in this excercise

Deleting VPC is risky, I will leave it out for you to manually clean it up if you created a new VPC. 

In [None]:
# this is used during developemnt, to reload the module after a change in the module
#try:
#    del sys.modules['pcluster_athena']
#except:
#    #ignore if the module is not loaded
#    print('Module not loaded, ignore')
    
#from pcluster_athena import PClusterHelper
# we added those ingress rules later, if we don't remove them, pcluster delete will fail
try:
    resp = ec2_client.revoke_security_group_ingress(GroupId=aws_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': onprem_pcluster_head_sg}] } ] ) 
except ClientError  as err:
    print(err , " this is ok , we can ignore")

try:
    resp = ec2_client.revoke_security_group_ingress(GroupId=onprem_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': aws_pcluster_head_sg}] } ] ) 
except ClientError  as err:
    print(err , " this is ok , we can ignore")
    





In [None]:
# if you are running this workshop in your own account, and you do not want to keep the RDS and the SSHKeys, please change the argument in cleanup_after()    
aws_pcluster_helper = PClusterHelper(aws_pcluster_name, aws_config_name, aws_post_install_script_prefix)
!pcluster delete-cluster --cluster-name $aws_pcluster_helper.pcluster_name --region $REGION
aws_pcluster_helper.cleanup_after(KeepRDS=True, KeepSSHKey=True)



In [None]:
onprem_pcluster_helper = PClusterHelper(onprem_pcluster_name, onprem_config_name, onprem_post_install_script_prefix)
!pcluster delete-cluster --cluster-name $onprem_pcluster_helper.pcluster_name --region $REGION
onprem_pcluster_helper.cleanup_after(KeepRDS=True,KeepSSHKey=False)

In [None]:
# delete the mungekey created during post_install
REGION=boto3.session.Session().region_name
!aws secretsmanager delete-secret --secret-id munge_key_$federation_name --force-delete-without-recovery --region $REGION


In [None]:
!aws secretsmanager delete-secret --secret-id slurm_token_onprem --force-delete-without-recovery --region $REGION
