= Test Performance :toc: :icons: :linkattrs: :imagesdir: ./../resources/images == Summary This section will test the performance of Amazon FSx for Lustre. == Duration NOTE: It will take approximately 15 minutes to complete this section. == Step-by-step Guide IMPORTANT: Read through all steps below before continuing. === Examine file system performance . *_Open_* the link:https://console.aws.amazon.com/fsx/[Amazon FSx] console and *_select_* the link of the *File system Name* or *File system ID*. . In the *Summary* section, *_review_* the values for the following file system attributes: + [cols="3,10"] |=== | Attribute | Value | Deployment type | Persistent | Data compression type | NONE | Storage type | SSD | Storage capacity | 7.2 TiB | Throughput per unit of storage | 50 MB/s/TiB | Total throughput | 360 MB/s |=== + . *_Open_* the link:https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html#fsx-aggregate-perf[Amazon FSx for Lustre User Guide] and *_scroll down_* to the *File system performance for SSD storage options* table. *_Review_* the table and the different performance attributes. . Using the file system attributes of the previous table and the performance capacities table that FSx for Lustre is designed to achieve from the Amazon FSx for Lusture User Guide, *_complete_* the table below to calculate the different performance attributes of the workshop file system. + .PERSISTENT-50 7.2 TiB file system{counter2:index:0} [cols="e,e,e,e,e,e"] |=== s|Storage capacity 2+>s|Network throughput (MB/s) s|Cache storage (GiB) 2+>s|Disk throughput (MB/s) | s|Baseline s|Burst | s|Baseline s|Burst |Per TiB |250 |1300 |2.2 RAM |50 |240 |7.2 TiB | | | | | m|For example m|250x7.2= m|1300x7.2= m|2.2x7.2= m|50x7.2= m|240x7.2= |=== + NOTE: The *Total throughput* value from the *Summary* section of the Amazon FSx console is the *Disk throughput (MB/s) Baseline* value, which is the least performant attribute of the file system. The file system is capable of performing at levels much higher than the disk baseline throughput based on access patterns (e.g. reading from Cache storage, etc.) and the network burst and baseline throughputs and disk burst throughput. + . As you complete the following performance tests, compare your test results with the designed performance attributes you calculated in the previous table. === Smallfile Read tests TIP: If you have the two (2) session windows open from the previous section, just use the existing session windows. Do not open two (2) more session windows. . Open two (2) *SSH terminal* or *EC2 Instance Connect* session windows connected to *Linux Instance*. + Start `*nload*` in one of the SSH terminal session windows. + [source,bash] ---- nload -u M ---- + . Write, read, stat, append, rename, and delete a large number of smallfiles. *_Run_* these commands in order and review the results. link:https://github.com/distributed-system-analysis/smallfile[smallfile] is a distributed metadata-intensive workload generator for POSIX-like filesystems. It is licensed under Apache License, Version 2.0. *_Monitor_* throughput in real time by going back to the *SSH terminal* or *EC2 Instance Connect* session window running *nload* during the execution of each *smallfile* run. + * Answer the following questions for each *smallfile* run: * How long did it take? (e.g. elapsed time) * What was the IOPS? * What was the throughput? (e.g. MiB/sec) + - Write (create) 316200 files + [source,bash] ---- _job_name=$(echo $(uuidgen)| grep -o ".\{6\}$") _prefix=$(echo $(uuidgen)| grep -o ".\{6\}$") _path=/fsx/${_job_name} mkdir -p ${_path} _threads=32 _file_size=64 _file_count=10000 _operation=create _same_dir=N _hash_into_dirs=Y sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- + - Read ~316200 files + [source,bash] ---- _operation=read sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- + - Stat ~316200 files + [source,bash] ---- _operation=stat sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- + - Append ~316200 files + [source,bash] ---- _operation=append sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- + - Rename ~316200 files + [source,bash] ---- _operation=rename sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- + - Delete-renamed ~316200 files + [source,bash] ---- _operation=delete-renamed sudo python3 ~/smallfile/smallfile_cli.py \ --operation ${_operation} \ --threads ${_threads} \ --file-size ${_file_size} \ --files ${_file_count} \ --same-dir ${_same_dir} \ --hash-into-dirs ${_hash_into_dirs} \ --prefix ${_prefix} \ --dirs-per-dir ${_file_count} \ --files-per-dir ${_file_count} \ --top ${_path} ---- === dd tests . Use dd to generate data ~4.3 GiB of data using 1 and 2 threads. + [source,bash] ---- _job_name=$(echo $(uuidgen)| grep -o ".\{6\}$") _bs=1024K _count=4096 _sync=conv=fsync _threads=1 _path=/fsx/${_job_name} mkdir -p ${_path}/{1..1} time seq 1 ${_threads} | parallel --will-cite -j ${_threads} sudo dd if=/dev/zero of=${_path}/{}/dd-$(date +%Y%m%d%H%M%S.%3N) bs=${_bs} count=${_count} ${_sync} ---- + [source,bash] ---- _job_name=$(echo $(uuidgen)| grep -o ".\{6\}$") _bs=1024K _count=2048 _sync=conv=fsync _threads=2 _path=/fsx/${_job_name} mkdir -p ${_path}/{1..2} time seq 1 ${_threads} | parallel --will-cite -j ${_threads} sudo dd if=/dev/zero of=${_path}/{}/dd-$(date +%Y%m%d%H%M%S.%3N) bs=${_bs} count=${_count} ${_sync} ---- + . How long did it take to generate ~4.3 GiB of data using 1 and 2 threads? . How important is it to use parallel threads to access the lustre file system? === ior tests . Use ior to generate 32 GiB of data using 1, 2, and 4 threads. + TIP: Monitor real-time throughput using the *EC2 Instance Connect* or *SSH terminal* session window with `*nload*` running. + [source,bash] ---- _job_name=ior _segment_count=32768 _threads=1 _path=/fsx/${_job_name} mkdir -p ${_path} cd /fsx mpirun --npernode ${_threads} --oversubscribe ior --posix.odirect -t 1m -b 1m -s ${_segment_count} -g -v -w -i 1 -F -k -D 0 -o ${_path}/ior.bin ---- + [source,bash] ---- _job_name=$(echo $(uuidgen)| grep -o ".\{6\}$") _segment_count=16384 _threads=2 _path=/fsx/${_job_name} mkdir -p ${_path} cd /fsx mpirun --npernode ${_threads} --oversubscribe ior --posix.odirect -t 1m -b 1m -s ${_segment_count} -g -v -w -i 1 -F -k -D 0 -o ${_path}/ior.bin ---- + [source,bash] ---- _job_name=$(echo $(uuidgen)| grep -o ".\{6\}$") _segment_count=8192 _threads=4 _path=/fsx/${_job_name} mkdir -p ${_path} cd /fsx mpirun --npernode $_threads --oversubscribe ior --posix.odirect -t 1m -b 1m -s ${_segment_count} -g -v -w -i 1 -F -k -D 0 -o ${_path}/ior.bin ---- + . How long did it take to generate 32 GiB of data using 1, 2, and 4 threads? The time command should return time values similar to these: + [source,bash] ---- threads=1 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: write 539.01 539.02 59.37 1024.00 1024.00 0.000479 59.37 0.000387 59.37 0 Max Write: 539.01 MiB/sec (565.19 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 539.01 539.01 539.01 0.00 539.01 539.01 539.01 0.00 59.36833 NA NA 0 1 1 1 1 0 1 0 0 32000 1048576 1048576 32000.0 POSIX 0 ---- + [source,bash] ---- threads=2 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: write 986.60 986.64 32.43 1024.00 1024.00 0.001054 32.43 0.000392 32.43 0 Max Write: 986.60 MiB/sec (1034.52 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 986.60 986.60 986.60 0.00 986.60 986.60 986.60 0.00 32.43464 NA NA 0 2 2 1 1 0 1 0 0 16000 1048576 1048576 32000.0 POSIX 0 ---- + [source,bash] ---- threads=4 Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- Commencing write performance test: write 1370.74 1370.85 23.34 1024.00 1024.00 0.001342 23.34 0.000453 23.34 0 Max Write: 1370.74 MiB/sec (1437.33 MB/sec) Summary of all tests: Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Max(OPs) Min(OPs) Mean(OPs) StdDev Mean(s) Stonewall(s) Stonewall(MiB) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggs(MiB) API RefNum write 1370.74 1370.74 1370.74 0.00 1370.74 1370.74 1370.74 0.00 23.34499 NA NA 0 4 4 1 1 0 1 0 0 8000 1048576 1048576 32000.0 POSIX 0 ---- + . How much read and write throughput was achieved using 1, 2, and 4 threads? . Use ior to generate 128 GiB of data using 256 threads. + TIP: Monitor real-time throughput using the *EC2 Instance Connect* or *SSH terminal* session window with `*nload*` running. + [source,bash] ---- _job_name=ior-128 _segment_count=512 _threads=256 _path=/fsx/${_job_name} mkdir -p ${_path} cd /fsx mpirun --npernode ${_threads} --oversubscribe ior --posix.odirect -t 1m -b 1m -s ${_segment_count} -g -v -w -i 1 -F -k -D 0 -o ${_path}/ior.bin ---- == Next section Click the button below to go to the next section. image::enable-data-compression.jpg[link=../05-enable-data-compression/, align="left",width=420]