This file includes a description of several sample command-line invocations along with guidance on how to interpret output file results. The following script can be used to compare the performance of Parallel, G1, and Shenandoah garbage collectors on the same workload with different memory sizes. The GC log file from the parallel GC run is most useful in determining application allocation rates and live memory usage. It is more difficult to extract this information from G1 and Shenandoah because the states of individual objects are in flux at each moment that the log reports for the concurrent G1 and Shenandoah collectors are issued. #!/bin/sh for i in 256M 384M 512M 640M 768M 896M 1024M \ 1152M 1280M 1408M 1536M 1664M 1792M 1920M 2048M do echo Running Parallel GC with $i heap size >&2 echo Running Parallel GC with $i heap size $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.parallel-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UseParallelGC -XX:ParallelGCThreads=8 \ -jar $EXTREMEM_HOME/extremem.jar \ -jar src/main/java/extremem.jar \ -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true echo Running G1 GC with $i heap size >&2 echo Running G1 GC with $i heap size $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.g1-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -jar $EXTREMEM_HOME/extremem.jar \ -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true echo Running Shenandoah GC with $i heap size >&2 echo Running Shenandoah GC with $i heap size $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.shenandoah-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \ -XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \ -jar $EXTREMEM_HOME/extremem.jar \ -dDictionarySize=50000 -dCustomerThreads=20000 -dReportCSV=true done Here's a different script that represents a workload configuration that keeps more memory alive and thus requires larger heap size configurations of the JVM. In this code, the first loop runs with Parallel GC for the purpose of calibrating the workload's allocation rates and live-memory usage. The second loop provides comparisons between G1 and Shenandoah GC on this workload. #!/bin/sh for i in 30G 48G do echo Running Parallel GC with $i heap size on huge workload >&2 echo Running Parallel GC with $i heap size on huge workload $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.calibrate-shen.parallel-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UseParallelGC -XX:ParallelGCThreads=16 \ -jar $EXTREMEM_HOME/extremem.jar \ -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ -dServerThreads=100 -dProductReplacementCount=8 \ -dProductReplacementPeriod=36s \ -dProductNameLength=25 -dProductDescriptionLength=120 \ -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ -dBrowsingExpiration=1m -dSimulationDuration=40m \ -dInitializationDelay=100ms -dReportCSV=true done for i in 30G 32G 48G 64G 96G 112G do echo Running G1 GC with $i heap size on huge workload >&2 echo Running G1 GC with $i heap size on huge workload $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.show-shen.g1-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -jar $EXTREMEM_HOME/extremem.jar \ -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ -dServerThreads=100 -dProductReplacementCount=8 \ -dProductReplacementPeriod=36s \ -dProductNameLength=25 -dProductDescriptionLength=120 \ -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ -dBrowsingExpiration=1m -dSimulationDuration=20m \ -dInitializationDelay=100ms -dReportCSV=true echo Running Shenandoah GC with $i heap size on huge workload >&2 echo Running Shenandoah GC with $i heap size on huge workload $JAVA_HOME/bin/java \ -showversion \ -Xlog:gc:batch.show-shen.shenandoah-gc.$i.log \ -XX:+AlwaysPreTouch -XX:+UseLargePages -XX:-UseBiasedLocking \ -XX:+DisableExplicitGC -Xms$i -Xmx$i \ -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC \ -XX:+ShenandoahPacing -XX:ShenandoahPacingMaxDelay=1 \ -jar $EXTREMEM_HOME/extremem.jar \ -dDictionarySize=500000 -dNumCustomers=10000 -dNumProducts=200000 \ -dCustomerReplacementCount=40 -dCustomerThreads=20000 \ -dServerThreads=100 -dProductReplacementCount=8 \ -dProductReplacementPeriod=36s \ -dProductNameLength=25 -dProductDescriptionLength=120 \ -dProductReviewLength=32 -dSelectionCriteriaCount=8 \ -dBuyThreshold=0.25 -dSaveForLaterThreshold=0.75 \ -dBrowsingExpiration=1m -dSimulationDuration=20m \ -dInitializationDelay=100ms -dReportCSV=true done The CSV report that is written to standard output independently reports the timeliness of many different repetitive operations that comprise the Extremem workload. Certain of the operations are more susceptible to randomness in the execution path lengths than others. For example, in one configuration, the number of products available for selection (based on lookups of randomly selected keywords within the data base of all existing products) ranged from 376 to 945. This is represented by the following excerpt of standard output: +------------------------- |Products available for selection: |max,945 |min,376 |average,609.597 +------------------------- Given this, we would expect to see a three-fold variance in execution times for product selection based on the size of the selection set alone, independent of GC interference. Following are the lines in the output report that describe the times required to process particular stepss of each "customer interaction". Preparation represents the time required to gather up and present to the customer all of the products that matched the customer inquiry. The following excerpt from standard output reports on the times required to perform this step. Note that, out of a total of 600,000 customer interactions, the time required to present product options ranged from 2 microseconds to 2.072.002 seconds. This is far worse than the 3-fold variation that would have been predicted by differences in problem size. The additional variance is due primarily to interference caused by various garbage collection activities. +------------------------- |Timeliness of preparation |Total Measurement,Min,Max,Mean,Approximate Median |600000,2,2072002,19753,9216 |Buckets in use,12 |Bucket Start,Bucket End, Bucket Tally |-256,0,0 |0,1024,6981 |1024,5120,3158 |5120,13312,443628 |13312,29696,113963 |29696,46080,13483 |46080,78848,9486 |78848,144384,3469 |144384,275456,1864 |275456,537600,1345 |537600,1061888,1541 |1061888,2110464,1082 +------------------------- Following the line of output that reports summary information, the distribution of measured service response times is reported as a weighted histogram. The intepretation of the above "bucket-list" data is that: 6,981 samples required less than 1.024 ms 3,158 samples required less than 5,120 ms and at least 1.024 ms 443,628 samples required less than 13.312 ms and at least 5.120 ms 113,963 samples required less than 29.696 ms and at least 13.312 ms 13,483 samples required less than 46.080 ms and at least 29.696 ms 9,486 samples required less than 78.848 ms and at least 46.080 ms 3,469 samples required less than 144.384 ms and at least 78.848 ms 1,864 samples required less than 275.456 ms and at least 144.384 ms 1,345 samples required less than 537.600 ms and at least 275.456 ms 1,541 samples required less than 1.061888 s and at least 537.600 ms 1,082 samples required less than 2.110464 s and at least 1.061888 s Each of the service response times reported by Extremem represents slightly different mixes of new-object allocation, read access to previously allocated objects, and mutation of previoiusly allocated objects. Product replacement processing is a service that has a more predictable workload. Each time product replacement processing is performed, the same number of randomly selected products are replaced, as specified on the extremem command line. Note that even with product replacement processing, there is some expected variation in processing effort, since the underlying representation of the product data base is a balanced binary tree protected by a multiple-reader single-writer mutual exclusion lock. If a large number of customers happen to be looking up products at the moment a request to remove products arrives, this will force a delay on the modifying thread while it waits for all the customer threads to complete their lookup requests. The standard output file also provides information regarding contended access to shared data structures. For example, the following lines report that the typical writer of the products data base (i.e. the typical attempt to replace products) has to perform an average of 1.976 wait operations with the total number of wait operations for any single writer ranging from 0 to 573. +------------------------- |Products concurrency report: |Total reader requests,1199860 |requiring this many waits,2010665 |average waits,1.6757497 |ranging from,0 |to,783 | |Total writer requests,160000 |requiring this many waits,316137 |average waits,1.9758563 |ranging from,0 |to,573 +------------------------- It is sometimes interesting to observe that the same workload configuration exhibits different concurrency contention under different garbage collection approaches. This is a secondary affect that results when a thread that holds a lock becomes blocked waiting for garbage collection services to be performed. This is a form of priority inversion in that garbage collection, typically running at a lower priority that application threads, is able to cause high priority application threads to delay their efforts. For this same workload configuration, Product replacement processing timeliness is represented by the following entries in the report written to standard output: +------------------------- |Product replacement processing: |batches,20000 |total,160000 |min per batch,8 |max per batch,8 |average per batch,8.0 | |Total Measurement,Min,Max,Mean,Approximate Median |20000,5,1436893,43450,10496 |Buckets in use,12 |Bucket Start,Bucket End, Bucket Tally |-256,256,114 |256,2304,18 |2304,6400,1 |6400,14592,13535 |14592,30976,2225 |30976,47360,1037 |47360,80128,1034 |80128,145664,824 |145664,276736,608 |276736,538880,312 |538880,1063168,256 |1063168,2111744,36 +------------------------- In 20,000 measurements, the time required to replace 8 products ranged from 5 microseconds to 1.436893 seconds. Compare the performance with different GC techniques for a better appreciation of the full direct and indirect impacts that GC implementation approaches have on service response times.