# Job Analyzer The [JobAnalyzer.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/JobAnalyzer.py) performs the cost simulation using the outputs from [AcceleratorLogParser.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/AcceleratorLogParser.py), [LSFLogParser.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/LSFLogParser.py) or [SlurmLogParser.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/SlurmLogParser.py) It produces an hour-by-hour cost simulation, placing the output in the `output/` subfolder (by default). For convenience, the analyzer can call the parser and analyze the output in 1 step, however we recommend performing the analysis in separate stages (see "How data is analyzed") in [index.md](index.md). ## Video walkthrough The fastest way to learn about using JobAnalyzer.py to simulate your HPC Costsis to watch this short walkthrough. [![Video walkthrough - Running the Cost Simulation](https://img.youtube.com/vi/mzE382N0v9A/0.jpg)](http://www.youtube.com/watch?v=mzE382N0v9A "HPC Cost Simulator - Running the cost simulation") Once you've run the tools, you'll see the `hourly_stats.xlsx` file under your output subdirectory. To learn how to use this file and apply cost optimizaitons, go to the [Outputs](https://github.com/aws-samples/hpc-cost-simulator/blob/main/docs/Output.md) documenration page and watch the video walkthrough there. ## Prerequisites [JobAnalyzer.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/JobAnalyzer.py) relies on: 1. the virtual environment created by the setup script. run `source setup.sh` to setup the virtual environment. 2. `config.yml` which defines the configuration of the analysis. For more details on the configuration file, see the [configuration documentation](config.md) 3. [instance_type_info.json](https://github.com/aws-samples/hpc-cost-simulator/blob/main/instance_type_info.json) which contains instance type details and pricing. The file is part of the repository, but if you want to download an update list of instances and their prices, please see [Updating the Instance Type Information](UpdateInstanceDatabase.md) # Performing the cost simulation Once you generated a CSV file from your Scheduler (See instructions for [IBM LSF](LSFLogParser.md), [SchedMD Slurm](SlurmLogParser.md) abd [Altair Engineering Accelerator](AcceleratorLogParser.md)) you can perform the cost simulation (step 3) using [JobAnalyzer.py](https://github.com/aws-samples/hpc-cost-simulator/blob/main/JobAnalyzer.py) To parse the CSV file into the final anonymized Excel report, run: ``` source setup.sh ./JobAnalyzer.py csv --input-csv INPUT_CSV_FILE_NAME ``` ## Outputs By default, HCS places all output files in the `output/` folder (this can be changed using the `--output-dir` parameter to `JobAnalyzer.py`). **Note:** The output folder will get overwritten without prompting you for approval. ## Full Syntax Arguments provided to `JobAnalyzer` are required in a specific order: ``` ./JobAnalyzer.py ``` These are the common arguments. ``` usage: JobAnalyzer.py [-h] [--starttime STARTTIME] [--endtime ENDTIME] [--config CONFIG] [--acknowledge-config] [--output-dir OUTPUT_DIR] [--output-csv OUTPUT_CSV] [--debug] parser ... Analyze jobs positional arguments: parser Choose the kind of information to parse. ./JobAnalyzer.py -h for parser specific arguments. accelerator Parse Accelerator (nc) job information csv Parse CSV from already parsed job information. lsf Parse LSF logfiles slurm Parse Slurm job information hourly_stats Parse hourly_stats file so can create Excel workbook (xlsx). optional arguments: -h, --help show this help message and exit --starttime STARTTIME Select jobs after the specified time. Format YYYY-MM- DDTHH:MM:SS (default: None) --endtime ENDTIME Select jobs before the specified time. Format YYYY-MM- DDTHH:MM:SS (default: None) --config CONFIG Configuration file. (default: ./config.yml) --acknowledge-config Acknowledge configuration file contents so don't get prompt. (default: False) --output-dir OUTPUT_DIR Directory where output will be written (default: output) --output-csv OUTPUT_CSV CSV file with parsed job completion records (default: None) --debug, -d Enable debug mode (default: False) ``` ### Arguments that apply to all schedulers When used, these parameters must precede the parser type: ``` --starttime STARTTIME Select jobs after the specified time. Format YYYY-MM-DDTHH:MM:SS (default: None) --endtime ENDTIME Select jobs before the specified time. Format YYYY-MM-DDTHH:MM:SS (default: None) --config CONFIG Configuration file. (default: ./config.yml) --output-dir OUTPUT_DIR Directory where output will be written (default: output) --output-csv OUTPUT_CSV CSV file with parsed job completion records (default: None) --debug, -d Enable debug mode (default: False) --help, -h Show help message ``` ### Parser Type The tool supports 5 parser types: ``` accelerator Parse Accelerator (nc) job information lsf Parse LSF accounting records (lsb.acct fields) slurm Parse Slurm job information csv Parse CSV from a previously parsed job information. hourly_stats Parse the hourly output files from a previous run ``` ### Parser-Specific Arguments - Accelerator ``` usage: JobAnalyzer.py accelerator [-h] [--default-mem-gb DEFAULT_MEM_GB] (--sql-output-file SQL_OUTPUT_FILE | --sql-input-file SQL_INPUT_FILE) optional arguments: -h, --help show this help message and exit --default-mem-gb DEFAULT_MEM_GB Default amount of memory (in GB) requested for jobs. (default: 0.0) --sql-output-file SQL_OUTPUT_FILE File where the output of sql query will be written. Cannot be used with --sql-input-file. Required if --sql-input-file not set. Command to create file: nc cmd vovsql_query -e "select jobs.id, jobs.submittime, jobs.starttime, jobs.endtime, resources.name, jobs.exitstatus, jobs.maxram, jobs.maxvm, jobs.cputime, jobs.susptime from jobs inner join resources on jobs.resourcesid=resources.id" > SQL_OUTPUT_FILE (default: None) --sql-input-file SQL_INPUT_FILE File with the output of sql query so can process it offline. Cannot be used with --sql-output-file. Required if --sql-output-file not set. (default: None) ``` ### Parser-Specific Arguments - LSF ``` usage: JobAnalyzer.py lsf [-h] [--logfile-dir LOGFILE_DIR] --default-max-mem-gb DEFAULT_MAX_MEM_GB optional arguments: -h, --help show this help message and exit --logfile-dir LOGFILE_DIR LSF logfile directory (default: None) --default-max-mem-gb DEFAULT_MAX_MEM_GB Default maximum memory for a job in GB. (default: None) ``` ### Parser-Specific Arguments - Slurm ``` usage: JobAnalyzer.py slurm [-h] [--slurm-root SLURM_ROOT] [--sacct-output-file SACCT_OUTPUT_FILE | --sacct-input-file SACCT_INPUT_FILE] optional arguments: -h, --help show this help message and exit --slurm-root SLURM_ROOT Directory that contains the Slurm bin directory. (default: None) --sacct-output-file SACCT_OUTPUT_FILE File where the output of sacct will be written. Cannot be used with --sacct-input-file. Required if --sacct- input-file not set. (default: None) --sacct-input-file SACCT_INPUT_FILE File with the output of sacct so can process sacct output offline. Cannot be used with --sacct-output- file. Required if --sacct-output-file not set. (default: None) ``` Note: The tool will call `sacct` to get accounting logs, if you don't have it installed on the machine, please see the [SlurmLogParser.py documentation](SlurmLogParser.md) for details on how to save them to a CSV file. ## What's Next? Once completed, you can find your Excel report under `output/hourly_stats.xlsx`. You can share it with your AWS account team (Recommended) for further cost optimization guidance, or you can learn more about the data in the Excel in the [Output](Output.md) documentation page.