+++ title = "c. Memory Scheduling 💾" weight = 23 +++ {{% notice info %}} This feature was launched in AWS ParallelCluster **3.2.0** and can be enabled by enabling **Slurm Memory Based Scheduling Enabled** in the **HeadNode** configuration screen. See [Slurm memory-based scheduling](https://docs.aws.amazon.com/parallelcluster/latest/ug/slurm-mem-based-scheduling-v3.html) in the AWS ParallelCluster docs for more info. {{% /notice %}} Slurm supports memory based scheduling via a `--mem` or `--mem-per-cpu` flag provided at job submission time. This allows scheduling of jobs with high memory requirements, allowing users to guarantee a set amount of memory per-job or per-process. For example users can run: ```bash sbatch --mem-per-cpu=64G -n 8 ... ``` To get 8 vcpus and 64 gigs of memory. In order to add in memory information, we have a managed post-install script that can be setup with Pcluster Manager. This script sets the `RealMemory` to **85%** of the available system memory, allowing 15% to system processes. ### Setup with 3.2.0 When setting up a cluster with version > **3.2.0**, simply toggle **Slurm Memory Based Scheduling Enabled** to on: ![Enable Memory Scheduling](memory-scheduling/HeadNode-Setup.png) Optionally you can setup the specific amount of memory that Slurm configures on each node, however I don't reccomend doing this as it may results in a job over-allocating memory. ### Setup with < 3.2.0 To enable this in versions < **3.2.0**, create a new cluster and in the **HeadNode** configuration screen, click on the "Advanced" dropdown and add in the managed `Memory` script: ![Enable Memory Script](memory-scheduling/memory.png) Then add the following managed IAM policy to the head node: ``` arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess ``` On the last screen your config should look similar to the following, note you'll minimally need `AmazonEC2ReadOnlyAccess` and `https://raw.githubusercontent.com/aws-samples/pcluster-manager/main/resources/scripts/mem.sh` script. ```yaml HeadNode: InstanceType: c5a.xlarge Ssh: KeyName: keypair Networking: SubnetId: subnet-123456789 Iam: AdditionalIamPolicies: - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore - Policy: arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess CustomActions: OnNodeConfigured: Script: >- https://raw.githubusercontent.com/aws-samples/pcluster-manager/main/resources/scripts/multi-runner.py Args: - >- https://raw.githubusercontent.com/aws-samples/pcluster-manager/main/resources/scripts/mem.sh Scheduling: Scheduler: slurm SlurmQueues: - Name: cpu ComputeResources: - Name: cpu-hpc6a48xlarge MinCount: 0 MaxCount: 100 Instances: - InstanceType: hpc6a.48xlarge Efa: Enabled: true Networking: SubnetIds: - subnet-123456789 PlacementGroup: Enabled: true Region: us-east-2 Image: Os: alinux2 ``` ### Test When the cluster has been created you can check the memory settings for each instance: ```bash $ scontrol show nodes | grep RealMemory NodeName=cpu-dy-cpu-hpc6a48xlarge-1 CoresPerSocket=1 ... RealMemory=334233 AllocMem=0 FreeMem=N/A Sockets=96 Boards=1 ... ``` You'll see that for the **hpc6a.48xlarge** instance, which has 384 GB of memory that `RealMemory=334233` or `384 GB * .85 = 334.2 GB`. To schedule a job with memory constraints you can use the `--mem` flag. See the [Slurm sbatch docs](https://slurm.schedmd.com/sbatch.html#OPT_mem) for more info. ``` $ salloc --mem 8GB ``` You can see the requested memory for that job by running: ```bash squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %.5m %.5c %R" JOBID PARTITION NAME USER ST TIME NODES MIN_M MIN_C NODELIST(REASON) 3 cpu interact ec2-user R 12:25 1 8G 1 cpu-dy-cpu-hpc6a48xlarge-1 ```