{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Slurm Federation on AWS ParallelCluster \n", "\n", "Built upon what you learning in pcluster-athena++ and pcluster-athena++short notebooks, we will explore how to use Slurm federation on AWS ParallelCluster. \n", "\n", "Many research institutions have existing on-prem HPC clusters with Slurm scheduler. Those HPC clusters have a fixed size and sometimes require additional capacity to run workloads. \"Bursting into cloud\" is a way to handle that requests. \n", "\n", "In this notebook, we will\n", "1. Build two AWS ParallelClusters - \"awscluster\" (as a worker cluster) and \"onpremcluster\" (to simulate an on-prem cluster)\n", "1. *Enable REST on \"onpremcluster\"\n", "1. *Enable Slurm accouting with mySQL as data store on \"onpremcluster\"\n", "1. *Enable Slurmdbd on \"awscluster\" to point to the slurm accounting endpoint on \"onpremcluster\"\n", "1. Create a federation with \"awscluster\" and \"onpremcluster\" clusters. \n", "1. Submit a job from \"onpremcluster\" to \"awscluster\"\n", "1. Submit a job from \"awscluster\" to \"onpremcluster\"\n", "1. Check job/queue status on both clusters\n", "\n", "Most of the steps (with an *) listed above are executed automatically in the post install script (scripts/pcluster_post_install_*.sh). \n", "\n", "\n", "\n", "Here is an illustration of Slurm Federation. In this workshop, we will an AWS ParallelCluster \"onpremcluster\" and an RDS database to simulate the on-prem datacenter environment, and we will use another AWS ParallelCluster \"awscluster\" for cloud HPC. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import boto3\n", "import botocore\n", "import json\n", "import time\n", "import os\n", "import sys\n", "import base64\n", "import docker\n", "import pandas as pd\n", "import importlib\n", "import project_path # path to helper methods\n", "from lib import workshop\n", "from botocore.exceptions import ClientError\n", "from IPython.display import HTML, display\n", "\n", "#sys.path.insert(0, '.')\n", "import pcluster_athena\n", "importlib.reload(pcluster_athena)\n", "\n", "\n", "# unique name of the pcluster\n", "onprem_pcluster_name = 'onpremcluster'\n", "onprem_config_name = \"config3-simple.yaml\"\n", "onprem_post_install_script_prefix = \"scripts/pcluster_post_install_onprem.sh\"\n", "slurm_version=\"22.05.5\"\n", "pcluster_version=\"3.3.0\"\n", "\n", "# unique name of the pcluster\n", "aws_pcluster_name = 'awscluster'\n", "aws_config_name = \"config3-simple.yaml\"\n", "aws_post_install_script_prefix = \"scripts/pcluster_post_install_aws.sh\"\n", "\n", "federation_name = \"burstworkshop\"\n", "REGION='us-east-1'\n", "# \n", "!mkdir -p build\n", "\n", "# install pcluster cli\n", "!pip install --upgrade aws-parallelcluster==$pcluster_version\n", "!pcluster version\n", "\n", "\n", "ec2_client = boto3.client(\"ec2\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install nodejs in the current kernal\n", "\n", "pcluster3 requires nodejs executables. We wil linstall that in the current kernal. \n", "\n", "SageMaker Jupyter notebook comes with multiple kernals. We use \"conda_python3\" in this workshop. If you need to switch to another kernal, please change the kernal in the following instructions accordingly. \n", "\n", "1. Open a terminal window from File/New/Ternimal - this will open a terminal with \"sh\" shell.\n", "2. exetute ```bash``` command to switch to \"bash\" shell\n", "3. execute ```conda activate python3```\n", "4. execute the following commands (you can cut and paste the following lines and paste into the terminal)\n", "\n", "```\n", "curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash\n", "chmod +x ~/.nvm/nvm.sh\n", "~/.nvm/nvm.sh\n", "bash\n", "nvm install v16.3.0\n", "node --version\n", "```\n", "\n", "After you installed nodejs in the current kernel, **restart the kernal** by reselecting the \"conda_python3\" on the top right corner of the notebook. You should see the output of the version of node, such as \"v16.9.1\" after running the following cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!node --version" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this is used during developemnt, to reload the module after a change in the module\n", "try:\n", " del sys.modules['pcluster_athena']\n", "except:\n", " #ignore if the module is not loaded\n", " print('Module not loaded, ignore')\n", " \n", "from pcluster_athena import PClusterHelper\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "# create the onprem clsuter\n", "onprem_pcluster_helper = PClusterHelper(onprem_pcluster_name, onprem_config_name, onprem_post_install_script_prefix, federation_name=federation_name, slurm_version=slurm_version)\n", "onprem_pcluster_helper.create_before()\n", "!pcluster create-cluster --cluster-name $onprem_pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$onprem_config_name --region $onprem_pcluster_helper.region\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cf_client = boto3.client('cloudformation')\n", "\n", "waiter = cf_client.get_waiter('stack_create_complete')\n", "\n", "try:\n", " print(\"Waiting for cluster creation to complete ... \")\n", " waiter.wait(StackName=onprem_pcluster_name)\n", "except botocore.exceptions.WaiterError as e:\n", " print(e)\n", "\n", "print(\"onpremcluster creation completed. \")\n", "onprem_pcluster_helper.create_after()\n", "\n", "resp=cf_client.describe_stacks(StackName=onprem_pcluster_name)\n", "outputs=resp[\"Stacks\"][0][\"Outputs\"]\n", "\n", "dbd_host=''\n", "for o in outputs:\n", " if o['OutputKey'] == 'HeadNodePrivateIP':\n", " dbd_host = o['OutputValue']\n", " print(\"Slurm REST endpoint is on \", dbd_host)\n", " break;\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# copy the ssh key to .ssh \n", "!cp -f pcluster-athena-key.pem ~/.ssh/pcluster-athena-key.pem\n", "!chmod 400 ~/.ssh/pcluster-athena-key.pem" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# create the awscluster - need to wait till the onprem cluster finish - need the dbd host \n", "aws_pcluster_helper = PClusterHelper(aws_pcluster_name, aws_config_name, aws_post_install_script_prefix, dbd_host=dbd_host, federation_name=federation_name, slurm_version=slurm_version)\n", "aws_pcluster_helper.create_before()\n", "!pcluster create-cluster --cluster-name $aws_pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$aws_config_name --region $aws_pcluster_helper.region\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "try:\n", " print(\"Waiting for cluster to creation to complete ... \")\n", " waiter.wait(StackName=aws_pcluster_name)\n", "except botocore.exceptions.WaiterError as e:\n", " print(e)\n", "\n", "print(\"awscluster creation completed. \")\n", "\n", "aws_pcluster_helper.create_after()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add security group to each cluster security group - this only applies to the current configuration where \n", "# both clusters are in AWS. \n", "# For a real on-prem environment, you will need to configure your network firewall to allow traffic between the two clusters\n", "# Each pcluster is created with a set of cloudformation templates. We can get some detailed information from the stack\n", "\n", "cf_client = boto3.client(\"cloudformation\")\n", "aws_pcluster_head_sg = cf_client.describe_stack_resource(StackName=aws_pcluster_name, LogicalResourceId='HeadNodeSecurityGroup')['StackResourceDetail']['PhysicalResourceId']\n", "onprem_pcluster_head_sg = cf_client.describe_stack_resource(StackName=onprem_pcluster_name, LogicalResourceId='HeadNodeSecurityGroup')['StackResourceDetail']['PhysicalResourceId']\n", "\n", "print(aws_pcluster_head_sg)\n", "print(onprem_pcluster_head_sg)\n", "\n", "try:\n", " resp = ec2_client.authorize_security_group_ingress(GroupId=aws_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': onprem_pcluster_head_sg}] } ] ) \n", "except ClientError as err:\n", " print(err , \" The security groups might have established trust from previous runs. Ignore.\")\n", "\n", "try:\n", " resp = ec2_client.authorize_security_group_ingress(GroupId=onprem_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': aws_pcluster_head_sg}] } ] ) \n", "except ClientError as err:\n", " print(err , \" The security groups might have established trust from previous runs. Ignore.\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add awscluster to the federation. \n", "\n", "Open two seperate terminal windows and use each to ssh into \"awscluster\" and \"onpremcluster\"\n", "\n", "Run the following command in terminal to login to the headnode of each clusters. Replace $pcluster_name with \"awscluster\" or \"onpremcluster\". \n", "\n", "```\n", "pcluster ssh --cluster-name $pcluster_name -i ~/.ssh/pcluster-athena-key.pem --region us-east-1\n", "```\n", "\n", "Run the following commands on awscluster headnode\n", "\n", "```\n", "sudo systemctl restart slurmctld \n", "\n", "sudo /opt/slurm/bin/sacctmgr -i add federation burstworkshop clusters=awscluster,onpremcluster\n", "```\n", "\n", "Restarting slurmctld will add awscluster to the clusters list, which can take a few seconds. if you get an error when running the second command, wait for a few more seconds and run it again.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Slurm Federation - job submission\n", "\n", "### Submit Job from onpremcluster to awscluster\n", "\n", "On the headnode of the **onpremcluster**, execute the following command. \n", "\n", "\n", "
\n", "Since /shared/tmp is owned by \"slurm\" user, you will need to submit the job as slurm user \n", "
\n", "\n", "```\n", "sudo su slurm \n", "cd /shared/tmp\n", "sbatch -M awscluster batch_test.sh\n", "```\n", "\n", "\n", "This will submit the job from the **onpremcluster** to **awscluster**. The batch script simply runs \"hostname\" command on two nodes, with 4 tasks on each node\n", "```\n", "#!/bin/bash\n", "#SBATCH --nodes=2\n", "#SBATCH --ntasks-per-node=4\n", "#SBATCH --partition=q1\n", "#SBATCH --job-name=test\n", "\n", "cd /shared/tmp\n", "\n", "srun hostname\n", "srun sleep 60\n", "```\n", "\n", "\n", "### View job status on awscluster\n", "\n", "On the headnode of the awscluster, use ```sinfo``` and ```squeue``` to check the cluster and queue status. You will see something like the following, which indicates that the \"awscluster\" received the job submission from \"onpremcluster\" and is allocating the two nodes requested in the batch script. \n", "\n", " \n", "\n", "When we are running the \"hostname\" command on multiple nodes , it will take some time for the nodes to power up. After the job is completed, you should be able to see the list of hostnames in the slurm output file. \n", "\n", " \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Don't forget to clean up\n", "\n", "1. Delete the ParallelCluster\n", "2. Delete the RDS\n", "3. S3 bucket\n", "4. Secrets used in this excercise\n", "\n", "Deleting VPC is risky, I will leave it out for you to manually clean it up if you created a new VPC. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# this is used during developemnt, to reload the module after a change in the module\n", "#try:\n", "# del sys.modules['pcluster_athena']\n", "#except:\n", "# #ignore if the module is not loaded\n", "# print('Module not loaded, ignore')\n", " \n", "#from pcluster_athena import PClusterHelper\n", "# we added those ingress rules later, if we don't remove them, pcluster delete will fail\n", "try:\n", " resp = ec2_client.revoke_security_group_ingress(GroupId=aws_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': onprem_pcluster_head_sg}] } ] ) \n", "except ClientError as err:\n", " print(err , \" this is ok , we can ignore\")\n", "\n", "try:\n", " resp = ec2_client.revoke_security_group_ingress(GroupId=onprem_pcluster_head_sg , IpPermissions=[ {'FromPort': -1, 'IpProtocol': '-1', 'UserIdGroupPairs': [{'GroupId': aws_pcluster_head_sg}] } ] ) \n", "except ClientError as err:\n", " print(err , \" this is ok , we can ignore\")\n", " \n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# if you are running this workshop in your own account, and you do not want to keep the RDS and the SSHKeys, please change the argument in cleanup_after() \n", "aws_pcluster_helper = PClusterHelper(aws_pcluster_name, aws_config_name, aws_post_install_script_prefix)\n", "!pcluster delete-cluster --cluster-name $aws_pcluster_helper.pcluster_name --region $REGION\n", "aws_pcluster_helper.cleanup_after(KeepRDS=True, KeepSSHKey=True)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "onprem_pcluster_helper = PClusterHelper(onprem_pcluster_name, onprem_config_name, onprem_post_install_script_prefix)\n", "!pcluster delete-cluster --cluster-name $onprem_pcluster_helper.pcluster_name --region $REGION\n", "onprem_pcluster_helper.cleanup_after(KeepRDS=True,KeepSSHKey=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# delete the mungekey created during post_install\n", "REGION=boto3.session.Session().region_name\n", "!aws secretsmanager delete-secret --secret-id munge_key_$federation_name --force-delete-without-recovery --region $REGION\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws secretsmanager delete-secret --secret-id slurm_token_onprem --force-delete-without-recovery --region $REGION\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "conda_python3", "language": "python", "name": "conda_python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 }