# HPC WORKSHOP WRF ON AWS with Graviton2 leveraging SPACK

## Introduction

The **Weather Research and Forecasting (WRF)** Model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications.

If you are looking for additional information, please refer to:
* Main Web Site: http://www.wrf-model.org
* V4 User Guide: https://www2.mmm.ucar.edu/wrf/users/docs/user_guide_v4/contents.html
* WRF Modeling System Toutorial - https://www2.mmm.ucar.edu/wrf/users/tutorial/tutorial.html
* Benchmark results running large scale Weather Forecast models on AWS: https://aws.amazon.com/it/solutions/case-studies/maxar-case-study/

In this workshop we will set-up a Elastic HPC cluster, leveraging *AWS ParallelCluster*, 
that can immediately be used to start running weather forecasts.  
Scripts included in this repo are released under MIT license and perform all required tasks
to download and compile required dependencies and tools needed to be able to run 
a full weather forecast from initialization data download to visualization of outcomes.

**Spack** is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers or target specific microarchitectures.  
Learn more [here](https://spack.io/about/) 

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This library is licensed under the MIT-0 License. See the LICENSE file.

## Cluster Setup with AWS ParallelCluster

This workshop has been prepared for AWS ParallelCluster version 3.  
The first step is to install AWS ParallelCluster console according to the following guide: https://docs.aws.amazon.com/parallelcluster/latest/ug/install-v3-virtual-environment.html

Once you have AWS ParallelCluster installed (and activated if you are installing it in a Python Virtual Env), the next step is to 
move to the directory generated by git when cloning the repo:

```bash
cd <PATH-TO-THE-GIT-REPO>
```

Now it is time to configure AWS ParallelCluster according to our needs.
This is done by editing  the template file provided in this repo: *pc_setup_scripts/pcluster-config.template* and adjusting parameters related to AWS account, region and VPC.
In this template file you will find cluster settings that provides good results with WRF and also for many other tightly coupled compute intensive algorithms, 
key configuration parameters are:
* using c6gn.16xlarge instances for compute nodes. Those instances have a low memory to CPU ratio but they have Graviton 2 cpu to optimize simulation cost and reduce carbon footprint and 100 Gbps network interfaces with EFA
* enabling EFA to take advantage of low latency networking for distributed computing
* configuring all compute nodes in the same placement group to further reduce latency due to physical distance among hosts
* enabling [DCV](https://docs.aws.amazon.com/dcv/latest/adminguide/what-is-dcv.html) in order to be able to visualize computational results directly from the Head node
* limiting maximum number of nodes to 6 (to avoid generating unexpected large clusters)
* configuring 0 compute node at rest (all compute nodes are shut down whene there are no job submitted to the scheduler)

For remaining AWS ParallelCluster configuration parameters we will use the default values (do not need to specify them in the config file).  

### Warning  
* Several of these settings will result in higher cost.  Please review [EC2 costs](https://aws.amazon.com/ec2/pricing/) prior creation.  
* Region has to be selected according to [Amazon Elastic Compute Cloud (EC2) C6gn Instances availability](https://aws.amazon.com/it/about-aws/global-infrastructure/regional-product-services).


## Create the cluster
We are now ready to use AWS ParallelCluster to spin up our new cluster for running weather forecasts by tiping:
```bash
mkdir -p $HOME/.parallelcluster
cp ./pc_setup_scripts/pcluster-config.template $HOME/.parallelcluster/config
pcluster  create-cluster -r <AWS_REGION>  -n wrf-workshop -c $HOME/.parallelcluster/config
```

AWS ParallelCluster will create the components highlighted in the following picture, by leveraging AWS CloudFormation:  
![AWS ParallelCluster architecture overview](./pictures/ParallelClusterArchitecture.png)    
Cluster spin-up will require approximately 30 minutes due to download and install spack.


## Log into the cluster using DCV
Once the cluster is up and running we can now log onto it leveraging DCV.
DCV allows a logging onto the Head node using a web browser and a signed url returned by following command:
```bash
pcluster  dcv-connect -r <AWS REGION> -n wrf-workshop --key-path  <PATH TO EC2 KEY NAME.pem> 
```

## Install WRF
Open a terminal within DCV, go to the filesystem shared across all nodes and run the install script
```bash
cd /shared/hpc-workshop-wrf/wrf_setup_scripts/
bash install_wrf.sh
```
This script leverages spack to install all required software: wps, wgrib, wrf and ncview. It will take about 50 minutes to complete the download, compile and install steps.  

## Run a weather forecast simulation

### Setup Environment variables
Load environment variables related to WRF set-up:
```bash
source /shared/setup_env.sh
```
This step has to be executed for each and every new window/session we open on the Head node.  
You can avoud it by appending that command to ~./bashrc
```bash
echo "source /shared/setup_env.sh" >>  ~/.bashrc
echo "echo 'WRF ENVIRONMENT INITIALIZED'" >>  ~/.bashrc

```

### Download reference data
In order to be able to run a weather forecast on a specific region we need:
* Static Geographical Data: How soil is in that region (i.e. lakes, forests, cities, hills, snow, mountains,...). Since those data do not vary frequently they are downloaded by install_wrf script.
* Gridded Meteorological Data coming from a large scale, gross grained, forecasting system that will set-up our starting condition.

#### Download Gridded Meteorological Data.
The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). 
The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks.
Gridded data are available for download through the NOAA National Operational Model Archive and Distribution System (NOMADS).
As with most works of the U.S. government, GFS data is not copyrighted and is available for free in the public domain under provisions of U.S. law.  
Because of this, the model serves as the basis for the forecasts of numerous private, commercial, and foreign weather companies.  

GFS data can be downloaded using the following script and are related to the date when this script is run. 
This script should run every time we want to make a new forecast for the next coming days.
```bash
get_noaa_grib_files.sh
```
Downloaded data are saved in: /shared/FORECAST/download/<current date>

### Configure specific geographic area to cover with the forecast
The next step is to set-up program configuration data (i.e. namelist.wps and namelist.input)
according to the area and the timeframe for which we want to run our forecast.
The following script automatically generates required configuration files 
for a map centered on the Mediterranean sea (latitude 40, longitude 14) covering central and south Europe and north Africa.
```bash
prepare_config.sh
```

### Generate reference data for selected domain
In this step, we extract relevant data to us.
We first set-up some environment variables:
```bash

ulimit -s unlimited

day=$(date +%Y%m%d)

WPSWORK=${TARGET_DIR}/preproc
WRFWORK=${TARGET_DIR}/run
DIRGFS=${SHARED_DIR}/FORECAST/download/$day

cd $WPSWORK
#Cleanup data related to previous run (in case they exists)
rm -f FILE*
rm -f PFILE*
rm -f met_em*
```
geogrid extracts relevant soil data from global dataset
```bash
./geogrid.exe 2>&1 |tee geogrid.$day.log 

```

GFS data are then copied to preprocessing directory in order to be filtered for our region on interest
```bash
cd $DIRGFS
cp -f GRIBFILE* $WPSWORK

cd $WPSWORK

./ungrib.exe 2>&1 |tee ungrib.$day.log 

```
and mix them with soil related data
```bash
./metgrid.exe 2>&1 |tee metgrid.$day.log 

mv met_em* $WRFWORK

```
 
### Run Weather Forecast
Last data preparation step is performed by real.exe that prepares Initial and Boundary Conditions files for later processing.
```bash
cd $WRFWORK
./real.exe  2>&1 |tee real.$day.log 
```

We are now ready to submit a WRF job using pre-installed scheduler (i.e. SLURM)
```bash
sbatch  ${SCRIPTDIR}/slurm_run_wrf.sh
```

With this statement we are asking to the scheduler to run the job, job details are defined within the slurm_run_wrf script.  
Hereafter you can see the content of that script.
```bash
#!/bin/bash 
#SBATCH --error=job.err
#SBATCH --output=job.out
#SBATCH --time=24:00:00
#SBATCH --job-name=wrf
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=64
#SBATCH --cpus-per-task=1

cd /shared/FORECAST/domains/test/run
mpirun  ./wrf.exe
```

This script requires the scheduler to run a job on 4 nodes, starting 64 process for each node (one per processor).
You can also try changing those parameter to see how forecast performances are affected.  
The integration between scheduler and AWS ParallelCluster checks if thre are enough resoruces to accomodate this job and, if not, spin up a number of new instances according to job needs.
Output and log files are saved under $WRFWORK directory.

We can run the same forecast using a different number of cores in order to be able to understand how WRF scales on AWS.

Further more, AWS ParallelCluster allows two different usage scenario:
- Single AWS ParallelCluster used to run multiple jobs in parallel leveraging Job schedulers
- Multiple AWS ParallelCluster running jobs indipendently

this, together with virtually unlimited resources available on AWS, allows running multiple forecasts with different configuration parameters in order to be able
to better evaluate if the forecast is stable (different set-ups converge to the similar results) 
or not (small changes in config parameters led to completely different results).


### Explore results
After the forecast is completed we can have a look at WRF's output files using ncview, a graphical application allowing visualization of WRF output. 


```bash
cd $WRFWORK
ncview wrfout*
```

Hereafter a few screenshots:  
Humidiy (QVapor)  
![QVAPOR](./pictures/QVAPOR.JPG)  
  
  
Soil Level Temperature (SLT)  
![Soil Level Temperature](./pictures/SoilLevelTemperature.JPG)  
  
  
Soil Level Temperature (SLT) over time on a specific map point  
![Soil Level Temperature for a selected poin over Time](./pictures/SoillevelTemperature_over_time.JPG)  


The following table shows 3 different tests for the same forecast, involving different level of parallelism and using gcc.

| Number of Cores      | WRF Elapsed Time (gcc) | 
|----------------------|:----------------------:|
|        128 (2 nodes) |            6000 sec.   |
|        256 (4 nodes) |            3500 sec.   |
|        216 (6 nodes) |            2600 sec.   |


### Cluster Cleanup
To remove the cluster we can use the following command issued from the parallel cluster console
```bash
pcluster delete-cluster -n  wrf-workshop -r <AWS_REGION>
```

This command deletes the Head node and you will loose any forecast data.
The Head node can also be simply switched off when not used and switched on again 
to start processing a new forecast.