JOverflow is a tool that analyzes Java heap dumps for a number of known memory usage inefficiencies. This documentation is for the command-line version of JOverflow. There is currently work in progress on integrating JOverflow with Mission Control. The resulting interactive tool's output will likely be somewhat different, but its basic concepts should remain the same.

How to use the tool

Downloading and running the tool

JOverflow takes a JVM heap dump in HPROF format. Files in this format usually have .hprof extension and are typically generated with the following command (for HotSpot VM; JRockit may use a different command):

jmap -dump:live,format=b,file=my_heap_dump.hprof <JVM process pid>

JOverflow reads the heap dump, processes the data, and produces a report summarizing its findings. To run JOverflow:

Build the Mission Control source code.
Locate the built jar files for org.openjdk.jmc.common and org.openjdk.jmc.joverflow, and run the tool as

java -cp org.openjdk.jmc.joverflow-{version}.jar:org.openjdk.jmc.common-{version}.jar org.openjdk.jmc.joverflow.Main foo.hprof > foo-joverflow.txt
For big heap dumps (over 1500M or so), you may want to adjust the JVM settings in the script. Typically you will need to add -d64 -Xmx<1.2..1.5 x dump size>.

JOverflow supports the following command line options:

-depth_first Scan the heap in depth-first order (default). This is usually faster, but can result in longer and less useful reference chains leading to problematic objects. See the section on top memory consumers for more details.
-breadth_first Scan the heap in breadh-first order. This is usually slower, but generally results in shorter and more useful reference chains for problematic objects, see above.
-full_obj_histo Print full object histogram (normally only top memory consumers are printed). See Class and Object Information for more details.
-ref_chain_depth=<n> Print up to n elements in reverse reference chains leading to problematic objects (default is 8). See the section on reference chains for problematic objects for more details.
-ref_chain_stoppers=<prefix1,prefix2,...> Stop printing a reference chain when class starting with any of the given prefixes is reached (default is oracle.apps.). See the section on top memory consumers for more details.
-pointer_size=<size in bytes> Explicitly specify JVM pointer size to be used in calculations. It generally makes sense to use this flag only for 64-bit heap dumps; see the next section for details.

An important note about heap dumps

The HPROF file format is platform-independent. Unfortunately, one (probably unintended) consequence is that it provides very little platform-specific information that tools like JOverflow or Eclipse Memory Analyzer (MAT) need in order to accurately calculate how much space is consumed by Java objects in the real JVM heap. The fundamental numbers that tools need for that are:

Pointer size - how many bytes an object reference takes
Object header size. In the heap, each Java object is preceded by a header that contains a pointer to the object's class, some internal flags, etc.
Object alignment. In the heap, objects typically cannot be placed at arbitrary addresses. Starting addresses are at least 8-bytes aligned for both HotSpot and JRockit JVMs.

A HPROF file contains only the so-called identifier size, loosely corresponding to the pointer size. It allows the tool to determine whether the JVM ran in 32-bit mode (in that case, identifier size is 4) or in 64-bit mode (size 8). In 32-bit mode, other numbers are, fortunately, fixed and are same for both HotSpot and JRockit:

Pointer size is 4 bytes
Object header size is 8 bytes. For arrays, additional 4 bytes are used for array length, effectively resulting in object header size of 12 bytes.
Object alignment is 8 bytes.

However, in 64-bit mode the situation is much more complex. It stems from the fact that in this mode the JVM may use narrow (compressed) references, which means that 32-bit, effectively truncated pointers are used within a 64-bit heap. The description in the JVM bug 7145625 accurately summarizes what happens in that case.

The above bug requests the HotSpot developers to provide the required numbers in the HPROF file, but it is currently unclear when it's going to be addressed. Thus presently JOverlow does not have a 100 percent reliable way to find out whether wide or narrow (compressed) references are used, and what the object alignment is. It is possible to specify pointer size explicitly, using the -pointer_size flag. Otherwise, for 64-bit heap dumps the tool currently uses guessing logic that works as follows:

Initially, pointer size is assumed to be 4 bytes if the file size is less than 26GB, and 8 bytes otherwise. That is, narrow references are assumed if the heap size (which more or less correlates with the heap dump size) is smaller than the limit used in HotSpot for switching between narrow and wide references.
Object alignment is always assumed to be 8 bytes (same situation as above)
Object header size is initially assumed to be 12 bytes (plus additional 4 bytes for arrays). However, if during subsequent heap dump scanning JOverflow comes across any class that belongs to jrockit.vm.* package, it assumes that the heap dump has been generated by JRockit, and changes the header size to 8 bytes (plus 4 bytes for arrays). According to JRockit engineers, that VM uses 8-byte headers on all platforms, so this part of our calculations is accurate, assuming JRockit was indeed used.
After all objects have been read from the heap dump, the tool attempts to determine the precise value of pointer size by looking at pairs of consecutive objects. We use the fact that object IDs, that is, numbers by which objects in the heap dump reference each other, are actually memory addresses of objects at the time when the heap dump was taken. So, the difference between IDs of two consecutive objects would be equal to the actual size, in memory, of the first object. The tool calculates the expected object size using the current pointer size, and then compares it with the actual object size above. If the numbers are the same, it's assumed that the current pointer size is correct. If the numbers are different, the alternative pointer size is used to calculate the alternative expected object size. If the alternative object size matches the actual size, it's assumed that the alternative pointer size is correct. If neither numbers match, the same operation is tried for another pair of objects, until a certain maximum number of objects is reached. Only objects that have fields of pointer and int types are used in this calculation, to avoid uncertainities with size of fields that may take less than 4 bytes, e.g. boolean or char.

Given this guessing logic, it may well happen that for the same 64-bit heap dump JOverflow and other tools, e.g. MAT, would report different object sizes and, consequently, different total heap size. Furthermore, it may well happen that both tools would be wrong. However, in the past the author has observed that changing these numbers could affect absolute figures considerably, but, interestingly, would not affect relative overhead estimates that much. That is, estimated memory usage and/or absolute overhead due to, say, empty HashMaps may jump from 200MB to 300MB, but the relative overhead only changes from 8 per cent to 9 per cent.

JOverflow heap report

A JOverflow report is divided into sections, each devoted to one or more memory problems of the same kind. For each problem, we determine the number of objects suffering from it, and calculate the overhead: the theoretical amount of memory that would be saved if the problem was eliminated. The amount of overhead can be considered a problem rank: the higher it is, the more sense it makes to fix the given problem.

Section 0: Important Issues Overview

This section contains a summary of the most acute problems detected by JOverflow. They are reported in detail in other sections referenced from this one. A problem of a particular kind gets into Section 0 if its overhead is higher than the fixed relative threshold (for example, 2 per cent for empty or sparse collections). Threshold values have been chosen based on statistics for a number of real-life heap dumps.

Section 1: Overall Stats

This section presents a diverse set of data. First, the fundamental values needed to calculate object sizes and overhead are reported: pointer size, object header size and object alignment. Next, overall heap stats are reported: the number of Java objects in the heap and its breakdown into instances, object arrays and primitive arrays. Finally, we report some "assorted raw stats" : various metrics that may occasionally signal certain memory usage anomalies, that have not made their way into more sophisticated analysis yet. For example, a high (more than 15-20 per cent) overhead due to object headers would be bad, and most likely would be due to a very large number of small objects in the heap.

Note that when the stats in this section are calculated, the tool does not make a distinction between objects that are parts of more complex data structures that JOverflow recognizes (known Java collections and java.lang.String instances) and all other objects, which we call standalone objects. In other words, when calculating raw stats, all Java objects are treated in the same way.

Some notable pieces of assorted raw stats are:

Size of all *$Entry instances: reports the total size of all objects whose class name ends with $Entry, for example HashMap$Entry. Such objects are almost always collection implementation details, and their total size gives some idea of overhead due to collection internals.
Number and overhead of short arrays. These metrics are reported, since each array in a Java heap incurs the overhead of array header and alignment, see above. The relative value of this overhead for a short array is larger than for a longer one. A large number of short arrays can become very wasteful.
Overhead of all null array entries: here we just add up the size in bytes of all null pointers in object arrays.
Number and overhead of short Strings. The problem with short Strings is similar to that with short arrays. The overhead is calculated as a number of java.lang.String objects with the value of String.length() in the given range, multiplied by shallow size of String.
Number and overhead of boxed numbers. Objects such as java.lang.Integer, java.lang.Character etc. take much more memory than values (int, char) that they wrap internally. Thus excessive usage of such objects can be very wasteful. The overhead of boxed number objects is calculated, for each type, as (sizeof(Number) - sizeof(number) + pointer_size) * num_of_Number_instances, where e.g. for Number equal to java.lang.Integer, number is int. We add pointer_size in the above formula, since it is expected that for each boxed number instance there is at least one pointer somewhere, and that takes valuable space as well.

There are two special subsections in this section as well. The first one gives some overall statistics related to class loaders. The two pieces that may be somewhat non-obvious are:

Classloader instances with loaded classes: this is gathered by walking all the Class objects in the heap dump and gathering references to their loaders. As a result, we know the number of loaders that do have loaded classes, and the how many separate types they belong to.
Classloader instances with only one loaded class: we check how many of the above-mentioned loader objects are only referenced by one Class object, and how many types these "single-loaded-class" loaders belong to.

The second subsection gives statistics on Strings that are either compressed or compressible. Here by "compressible String" we mean one that contains only ASCII characters (0..255), and therefore its backing char[] array can potentially be replaced with a byte[] array (or is actually replaced with one, as done in some JDK versions - that's what lines that mention byte[] tell about).

One important aspect of gathering data for this section is related to the fact that in JDK versions prior to JDK 7u6 two or more String objects can use different parts of the same backing char[] array (or a single String can use only a subset of its backing array). This means that there is generally no simple answer to questions like "how many characters are used by this group of String objects". The same character in some backing char[] array may be used by several String objects, or not used at all. This is important, because in a given char[] array some characters may be ASCII (compressible) and some not. So here we only look at characters that are actually (logically) contained in each String. Furthermore, when calculating "used bytes in backing arrays", we also use only the above-mentioned characters, and we don't count characters in a backing array if we have already seen that array before. This can lead to the "number of used bytes in backing arrays" in this section being considerably smaller than the total number of bytes in String backing arrays as presented in the Class and Object Information section.

Section 2: Class and Object Information

In this section, the tool reports the total number of Java classes and Java objects. Java objects are class instances and arrays. JOverflow also reports how many classes in the dump have no instances at all, or just one instance. Some number of classes with no instances is present in any heap dump, since some classes only contain static fields and methods, instances of many anonymous classes tend to be short-lived, etc. However, a large number of classes with no instances may signal a problem, such as machine-generated classes that are always generated by some library but never used by the application.

Finally, an object histogram is printed for classes with instances consuming at least 0.1 per cent of the total heap size. Typically such objects collectively consume almost all of the heap, but represent a small fraction of the total number of classes. To get a full list of classes, run the tool with -full_obj_histo flag.

In the object histogram, we report the following numbers for each class: number of instances, shallow size of instances, and implementation-inclusive size of instances. The latter number is the same as shallow size for most classes, except for:

Known Java collections: container classes in java.util.* and java.util.concurrent.* packages, and their subclasses, including user-defined ones. For a collection, implementation-inclusive size is the size of the collection object itself, plus the size of all objects that are, logically, a part of this collection's implementation. For example, for a HashMap that would be the HashMap$Entry[] array referenced by HashMap.table field, plus the total size of all HashMap$Entry objects referenced from it. However, the contents of this map, that is, keys and values, are not included. Since many collections, such as ArrayList, contain an Object[] array in their implementation, the implementation-inclusive size for Object[] arrays in the JOverflow object histogram is smaller than their shallow size. That's because implementation-inclusive size for Object[] only includes standalone arrays.
String instances. An implementation-inclusive size for a String is the size of the instance plus the size of its backing char[] array. For this reason, implementation-inclusive size of char[] arrays is also smaller than their shallow size: it includes only standalone arrays.

The "paradox" with shallow vs implementaiton-inclusive size for char[], Object[], HashMap$Entry etc. is easy enough to understand. However, it turns out that occasionally implementation-inclusive size may be smaller than shallow size for HashMap and LinkedHashMap! That happens when a heap dump contains many instances of HashSet or LinkedHashSet and few instances of HashMap or LinkedHashMap. An instance of HashSet always points to an instance of HashMap, same with LinkedHashSet and LinkedHashMap. Shallow size for HashMap is calculated for all HashMaps, both standalone and those that are owned by HashSets. So, when there are many HashSets in the heap dump, this number may end up pretty big. However, implementation-inclusive size for HashMap is calculated only for standalone HashMaps. Thus if there are far fewer HashMaps than HashSets, this number, based on a small number of standalone HashMaps, may end up relatively small.

Sections 3,4: Number, size and nearest fields/reference chains for top memory consumers

Top memory consumers are classes whose instances consume the highest amount of memory. In this section, JOverflow provides information on data structures that hold these objects in memory.

From the class histogram, the tool determines classes whose instances consume the highest amount of memory. Next, it performs a depth-first or breadth-first scan of the heap dump, starting from GC roots. Whenever an instance of one of the above classes is detected, a whole reference chain to it is recorded. It is important that within a reference chain, we don't distinguish individual array elements or collection elements. So, for example, if object A are reachable via the chain looking like GC_root_1 -> Foo.bar[0], and object B is reachable via chain GC_root_1 -> Foo.bar[1], JOverflow will record the same reference chain for them, that will look like GC_root_1 -> Foo.bar -> Object[]. Furthermore, when multiple objects get reached via the same reference chain, the information for these objects is aggregated. In the end, we know how many objects of given types (for example, 100 instances of HashMap and 200 instances of LinkedHashMap) are reachable via the given reference chain, and how much memory they use. A group of objects reachable via the same reference chain is called an object cluster.

Note that breadth-first scan (you can choose it via the command-line option) is usually slower than depth-first, but generally results in shorter and more useful reference chains. That's because, by design, a reference chain (at least from the given GC root) via which we get to the given object in this way is the shortest possible one. The shortest reference chain is what people usually use as a "primary" one and perceive as the most "logical" one. If the same object is reachable via two reference chains, the longer one tends to include various secondary data structures, WeakHashMaps and so on. If there is a group of objects reachable via both short (good) and long reference chain, depth-first scan may report half of them under the first chain, and another half under the second. As a result, it may be less clear where the biggest groups of problematic objects are.

Section 3 presents the above information aggregated one level further: for each object cluster we take only the data field that references the cluster directly, and then merge all clusters with the same nearest referencing data field. This representation is more compact, results in fewer clusters, and is generally more useful. That's because when the user tries to figure out how to fix a particular problem, in many cases they need to look no further than the code that creates and directly manages a particular object, which is in the same class as the data field referencing that object.

Section 4 presents reference chains described in the first paragraph, printed in reverse order - from the object cluster to the GC root. Intermediate arrays, collections and custom linked lists are printed in the condensed form. That is, we avoid printing internals of e.g. a HashMap, and thus a part of the chain that in reality looks like HashMap.Entry.key <- HashMap.table[] <- Foo.myMap would be represented as {HashMap} <- Foo.myMap. Reference chains may still be very long, so to save space they are truncated. By default, that happens either after the 8th element or after the first element that belongs to an oracle.apps.* class. There are two command line flags, -ref_chain_depth=<n> and -ref_chain_stoppers=<prefix1,prefix2,...> that can be used to override these values.

Section 5: Problematic Collections

In this section, known collections that suffer from problems recognized by JOverflow are reported. For each problem, types of relevant collections, number of collection instances and total overhead is reported.

Known collections are defined as container classes in java.util.* and java.util.concurrent.* packages and their subclasses. JOverflow currently recognizes the following problems with them (some ways of addressing these problems are suggested in the separate section of this document):

Empty unused: a collection is empty if it contains no elements. The overhead of an empty collection is defined as the amount of memory that would be saved if the collection was not allocated at all - that is, the implementation-inclusive size in bytes of the collection (see previous section). Empty collections where capacity may exceed size - typically array-backed collections, such as HashMap and ArrayList - may incur surprisingly high overhead, especially if the initial capacity is higher than the default value. For a ConcurrentHashMap in JDK releases prior to 7, the size of an empty instance initialized with default values is whopping 1200 bytes. Fortunately, in more recent JDK versions its code has been modified to create internal ConcurrentHashMap.Segment objects lazily, which greatly reduced this overhead.
Empty unused are empty collections that furthermore have never been used, i.e. never contained any elements. This is determined using long or int modCount field that is defined in most collections and is incremented every time a collection is updated.
Empty used are empty collections that previously contained some elements. That is assumed to be the case if modCount field (see the previous item) is not zero.
Empty are empty collections for which we cannot determine whether they previously contained any elements, because they don't have the modCount field.
Small Sparse: a collection is sparse if the number of elements in it is fewer than a half of its capacity. Consequently, only collections where capacity may exceed size - typically array-backed collections - can be sparse. The overhead is defined as the total size in bytes of empty slots (null pointers) in the collection - that is, how much memory would be saved if the collection did not contain any empty slots. A collection is furthermore small sparse if its capacity does not exceed the default capacity (e.g. 16 for HashMap). See the discussion of fixing problems in collections on why the distinction between small and large (see below) sparse collections is important.
Large Sparse: these are collections that are sparse (see above) and have a capacity larger than default. See the discussion of fixing problems in collections for further details.
Boxed: these are collections that contain boxed numbers, i.e. instances of classes such as java.lang.Integer, java.lang.Double etc. Such collections are considered problematic since instances of boxed numbers take a lot more memory than int, double etc. numbers that they wrap. The overhead of a boxed collection is defined as the amount of memory that would be saved if we replaced the collection with one (for lists) or two (for maps) arrays of the corresponding primitive numbers. Technically, its value is calculated as collection_impl_size + (sizeof(Number) + pointer_size - sizeof(number)) * num_elements. Here Number stands for java.lang.Integer, java.lang.Double etc., and number stands for int, double, etc. Note that currently a collection is considered boxed if a random sample of a key or value or both returns a boxed number object.
Vertical bar: this is applicable only to collections that are similar to two-dimensional arrays, that is, lists of lists or lists of arrays. A "vertical bar" is a collection where outer dimension M is (much) larger than the inner dimension N. Such a collection is wasteful, since each short inner array or list incurs a per-object overhead (list or array implementation size) and requires a pointer from the outer list. The overhead of a bar collection is defined as the amount of memory that would be saved if outer and inner dimensions are "flipped", i.e. we replace an M by N list with N by M one. Technically, it is calculated as (M - N) * (pointer_size + array_or_list_overhead)
Small: these are collections that contain a very small number of elements (currently defined as 4 or fewer). Such collections are considered problematic, since for them the "fixed costs" (size of implementation details that don't depend on the number of elements in collection) are comparable to, or higher (sometimes much higher) than the total size of "workload" in that collection. For example, the minimum amount of memory used by a HashSet, which internally consists of several Java objects with multiple data fields each, is around 100 bytes on a 32-bit JVM. If a HashSet contains just 3 elements, its workload size is 12 bytes - an order of magnitude smaller than its implementation size. The overhead of a small collection is defined as the amount of memory that would be saved if the collection is replaced with one (for lists) or two (for maps) arrays of objects containing this collection's elements. Technically, it is calculated as collection_impl_size - (pointer_size * num_elements + array_overhead) (for lists) or collection_impl_size - 2 * (pointer_size * num_elements + array_overhead) (for maps).

Section 6: Problematic Standalone Object Arrays

This section is similar to the previous one, but covers standalone object arrays - that is, arrays of Java objects that are not a part of the implementation of any collection known to JOverflow. The following problems are recognized:

Length 0: arrays that have length zero, i.e. cannot contain any elements. Such arrays might be useful in very few situations; usually they are a useless by-product of methods that blindly allocate arrays with specified length. The overhead of such an array is defined as its size in bytes, which is array header plus possible object alignment.
Length 1: arrays that have length 1. An array of length 1 is equivalent to a direct reference to the object that it contains - in other words, the whole array is unnecessary. Thus its overhead is defined as its size in bytes.
Empty: an object array is empty if it contains only null elements. The overhead is defined as the array size in bytes.
Sparse: an object array is sparse if the number of non-zero elements in it is fewer than a half of its length. The overhead is defined as the total size in bytes of empty slots (null pointers) in this array - that is, how much memory would be saved if there were no empty slots.
Boxed: these are arrays that contain boxed numbers, e.g. java.lang.Integer. See the description for boxed collections above. The overhead of a boxed array is calculated as (sizeof(Number) - sizeof(number)) * num_unique_referenced_Number_objects + pointer_size * num_references_to_Number. That is, we take care of a situation when multiple elements of a boxed array reference the same boxed number object. That would occur if, for example, the majority of array's elements are instances of java.lang.Integer with small values, for which Integer.valueOf() method returns unique cached objects. Note than an array is considered boxed even if it contains just one boxed number object, but the overhead strictly depends on the number of such objects.
Vertical bar: this is applicable only to two-dimensional arrays, and is very similar to vertical bar collections described in the previous section (in fact, long arrays of short lists would also fall into this category). The overhead is defined in the same way, as the amount of memory that would be saved if array dimensions were "flipped".

Section 7: Problematic Standalone Primitive Arrays

This section is similar to the previous two, but covers standalone primitive arrays, that is, arrays of primitive types, such as int or char, that are not a part of any collection known to JOverflow, and not backing char[] arrays of java.lang.String instances.

Length 0: this is defined and handled absolutely the same as zero-size object arrays.
Length 1: arrays that have length 1. A primitive array of length 1 is equivalent to single field with the value that this array contains. Thus the overhead of such an array is defined as its size in bytes plus pointer size minus the size of the contained value (e.g. 2 bytes for a char).
Empty: these are arrays that contain only zeros. The overhead is defined as the array size in bytes.
Long zero-tail (LZT): these are arrays that end with a continuous block of zeros, that is longer than a half of the array's length. Arrays looking like this often happen to be buffers in classes such as I/O streams, text parsers, etc., that have been allocated with excessive capacity. The overhead for such an array is defined as the size in bytes of the block of zeros in the end of the array.
Unused high bytes: these are arrays of type char[], short[], int[], long[], where each element uses no more than a half of its bits. For example, an int[] array where each element is in the Short.MIN_VALUE .. Short.MAX_VALUE or Byte.MIN_VALUE .. Byte.MAX_VALUE range. If by design an element that is out of this range can never be added to this array, such an array can be replaced with short[] or byte[], saving 1/2 or 3/4 of memory used by it.

Sections 8 and 9: Number, overhead and nearest fields/reference chains for problematic collections and arrays

These sections give information on data structures that hold problematic collection and arrays in memory. It is gathered and presented in the same way as the information on top memory consumers in sections 3 and 4. The only differences are that the tool reports memory overhead rather than memory usage for problematic objects. Furthermore, as we know from the previous section, many different kinds of problems can occur with collections and arrays, and sometimes the same object can have several problems. Thus the tool reports both problem kinds and associated overheads for each object cluster.

It may happen that in addition to problematic objects in some cluster, there is also some number of "good" collections or arrays reachable via the same reference chain. If so, the number of good objects is printed after all the information on the bad ones in the given cluster, both in section 6 and 7. Knowing the number of good objects is important when choosing the way to fix a problem with bad ones, as discussed in the next section.

Fixing known problems with collections and arrays

In many cases, the best fix to a performance problem stems from deep understanding of the code, and architectural changes that completely eliminate problematic objects and/or result in more appropriate data structures. The tips presented below assume that either the application is already considered well-optimized, or only limited, local changes can be made to it for whatever reason.

Empty collections and arrays. A radical way to avoid the overhead of empty collections is to avoid allocating them until the first use (lazy initialization). If a collection is referenced by a data field that's accessed by many methods, switching to lazy initialization would require changing the code of all these methods, so that they check if the field is null and initialize it if so. That would make the code less readable and more brittle; the alternative is to replace all these direct uses with calls to a single accessor method that deals with lazy initialization. Still such changes may be time-consuming, and in some cases may create races in object initialization if these methods can be called by multiple threads.

For this reason, in some situations it makes sense to leave eager collection initialization in place, but change its initial capacity to a smaller number, e.g. use new HashMap(8) instead of new HashMap() (the latter is equivalent to new HashMap(16)). This will reduce the overhead for those HashMaps that stay empty, but may result in some performance overhead due to more frequent collection resizing if many HashMaps allocated at the same site end up growing bigger than their initial capacity. Here, as well as in other cases described below, the information on the number of good collections referenced by the same field, that JOverflow provides, may very helpful. If most collections are bad, the overhead due to the few good ones, that will be negatively affected by the fix, is likely to be negligible. However, if a considerable number of collections allocated at the given point in the code are good, the fix will have to be more sophisticated.

Length 0 arrays are useless in most cases, and should be avoided whenever possible. However, one common case where this cannot be done easily is when some array (a data field, or a value returned by some method) is handled in many places in the code, and changing them all to check for null is difficult. In that case, a reasonable alternative is to create an empty array of the appropriate type once, save it in a special static field, and then assign or return this singleton instance every time a zero size is passed in.

Length 1 arrays are wasteful, because such an array can, in principle, be replaced with either a direct reference to the single element that it contains, or a primitive value that it contains. However, that cannot always be done, e.g. Bar[] Foo.bar may point to arrays of length 1 in 98 per cent of cases, but to long arrays in the remaining cases. In that case, it may make sense to use change the type of Foo.bar from Bar[] to Object and then make it point either to a single instance of Bar or an array. The code that deals with bar will likely look more complex than before because of instanceof checks, but it may be justified by the resulting memory savings.

Sparse collections and arrays: their overhead can be reduced by changing the initial capacity (but see the warning above regarding possible frequent resizing as a result). The task is easier if we know there are no or very few "good" collections in the same cluster. Small sparse collections are usually easier to fix. Such a collection is typically a result of creating a standard collection with the default constructor, and then adding just a few elements to it. To reduce the overhead, it is sufficient to hardcode a more appropriate (smaller than default) initial capacity parameter to the constructor. Keep in mind that one of the most popular collections, java.util.HashMap, with default capacity 16, can by design only have a power of two capacity. Thus, for example, invoking new HashMap(10) will still produce a HashMap instance with capacity 16.

Collections that are large yet sparse may be a result of specifying too large an initial capacity, removing many elements from the collection (most array-based collections don't "shrink" in that situation), or, in some cases, be a side-effect of automatic collection growth. For example, the capacity of a HashMap is increased twice when the number of elements in it grows larger than 3/4 (default load factor) of the current capacity. Thus in the updated HashMap, only 3/8 of slots are occupied, making it sparse.

Fixing large sparse collections is not always easy, since one needs to determine the root cause of the problem. For sparse HashMaps with appropriate initial capacity and no element removal, it is possible to reduce sparseness by specifying, in the constructor, a load factor that is higher than the default (0.75). However, this measure may affect CPU performance (any hash table that is nearly full experiences a larger number of hash collisions), and thus should be used with great care.

Boxed collections can both create a considerable memory overhead and increase GC pressure due to the large number of small objects that they contain. Fortunately, in most cases what the user needs to store in such a collection are actual numbers, rather than instances of java.lang.Integer etc. Thus it is best to replace problematic boxed collections with specialized ones, that store primitive numbers. Apache Commons library provides such collections, but may or may not be allowed to use; GNU Trove is another option, but much less likely to be allowed due to licensing issues. In many cases, e.g. when deleting elements from the collection is not required, it is easy to write a custom implementation.

Vertical bar collections and arrays. As already mentioned, the easiest way to reduce overhead of such objects is to "flip" their dimensions. Furthermore, one can generally convert a two-dimensional array into a single-dimensional one (albeit at the cost of making code somewhat more complex and less readable) if, for example, there are many such arrays in the heap and the overhead of sub-arrays is considerable.

Small collections. To save maximum amount of memory, a small collection should be replaced with an array of objects. Of course, in reality it is not always easy to do that. Perhaps the simplest case is when all collections referenced by field Foo.c are of the same size, initialized in a constructor and are not updated afterwards. If replacing with an array is difficult, but at least it's known that any collection to which Foo.c can ever point is small, it may make sense to think about an alternative, more economical collection class. For example, one may replace a small HashSet with an ArrayList and use indexOf() method of the latter instead of HashSet.contains() without noticeable loss of performance. Similarly, for a small number of elements a HashMap with manual synchronization is likely to perform no worse, or even better than a ConcurrentHashMap.

Long zero-tail primitive arrays are largely equivalent to sparse object arrays, and the same advice about choosing right initial capacity applies to them.

Primitive arrays with unused high bytes should, ideally, be replaced with arrays of more appropriate type, e.g. int[] with short[].

Section 10: Duplicate String Stats

Duplicate strings are multiple instances of java.lang.String with the same value. For example, when Foo.foo.equals("xyz") , Foo.bar.equals("xyz") , Foo.foo != Foo.bar there is duplicated string "xyz" and two String instances with this logical value referenced by data fields Foo.foo and Foo.bar.

Every instance of java.lang.String points to a backing char[] array with the actual string contents. Therefore string duplication can take two forms: two or more String instances with the same logical value can be backed by different char[] arrays with identical contents, or by the same char[] array. In the first case, all but one String objects and their backing arrays are considered to be overhead by our tool. In the second case, only the redundant String objects are overhead.

Most Java applications generate some number of duplicate strings. Sometimes only 20-30 per cent of all String instances are unique, and the overhead of duplicate strings can be as high as 20-25 per cent of the total heap size. However, usually there are quite a lot of different string values for duplicate strings, and very rarely copies of a single string take more than one per cent of the heap. In section 8, JOverflow gives a list of individual strings causing overhead of 0.1 per cent of the heap or more. Usually this list is pretty short. For the vast majority of duplicate strings, the contribution of each individual string is much smaller. Thus it makes sense to consider such strings only when they are aggregated with others based on a common reference chain and/or the nearest referencing data field.

The exact meaning of the metrics reported in this section is explained below. For illustration, we assume that a heap dump contains just six separate String objects:

s1 = "foo"; s2 = "bar"; s3 = "foo"; s4 = "bar"; s5 = "abc"; s6 = "xyz";

Furthermore, s1 and s3 are backed by the same char[] array, whereas s2 and s4 are backed by two different arrays with identical contents.

Total strings: the total number of java.lang.String instances, 6 in our example
Unique strings: the number of separate logical values among all strings. In our example, it will be 4: "foo", "bar", "abc", "xyz"
Duplicate values: the number of separate logical values among strings that are duplicated. 2 in our example: "foo", "bar"
Overhead: the size, in bytes, of redundant String objects and backing arrays (in our example that's two String objects, s2 and s4, and one char array, "bar").

Additionally, for each of the top duplicate strings the following numbers are reported:

Overhead: calculated as specified above
Number of char arrays: how many backing char arrays exist for this string value
Number of objects: how many String instances with this value exist
Maximum array length: the length of the longest backing char array. This number may be greater than the corresponding string length due to the fact that prior to JDK7u6 a String (or more likely several String instances) can be backed by a char array that is longer than the String itself (that is, the String uses only a fragment of that array).
Value: string value; can be truncated if too long.

Sections 11 and 12: Number, overhead and nearest fields/reference chains for duplicate strings

Information in these sections is collected and aggregated in exactly the same way as for problematic collections reported in sections 8 and 9. The notable highligts are:

For each cluster, the values of several top duplicated strings are printed.
A cluster of duplicated strings may include more than one unique string value - that is, four String objects with values "a", "a", "b", "b", reached via the same reference chain, are aggregated into the same cluster.
For each cluster, JOverflow presents the total number of duplicated strings, the number of duplicated backing arrays, and, finally, the number of non-duplicated strings (those with logical value unique in this heap dump) reached via the same reference chain. This information is important for deciding how to get rid of the particular group of duplicated strings, as discussed in the next section.
If some string is listed as a duplicate in the given cluster, its other copies may belong to other cluster(s) as well. In other words, the fact that a string is duplicated is treated as its global property, not a local one within the given cluster.
When calculating overhead for a particular string cluster, JOverflow considers only the String objects within this cluster and their backing char[] arrays. Furthermore, the same backing array is never used in calculations more than once.

Getting rid of duplicate strings

As with problematic collections, better overall application architecture often leads to less redundancy, fewer objects, and thus fewer duplicated objects, including strings.

However, some operations, e.g. parsing a text document, inevitably produce duplicated strings. Depending on how these strings are handled subsequently, it may be possible to come up with efficient case-specific duplication elimination methods. This is, however, beyond the scope of this document. Below we suggest one universal technique for local elimination or reducing the number of duplicated strings. This technique often works efficiently, but sometimes it may cause considerable overhead of its own, both in terms of memory and CPU cycles. Thus, as any optimization, it should be fully understood and used with care.

The technique is based on string interning and works as follows. Using JOverflow output, one can determine that some data field Foo.foo points to a sufficiently large cluster of duplicated strings. Provided that the number of duplicated strings in this cluster is considerably (ten times or more) higher than the number of non-duplicated strings reported for the same cluster (that's why this number is important!) the following can be done:

Find out where Foo.foo is written. The simplest situation is when it is a private field that is initialized in the constructor and never changed afterwards. The worst case is if it's a public field and can be written anywhere in the code.
Add a call to String.intern() method at every line where a value is stored into Foo.foo. For example, if there is a line in the constructor like this.foo = s; replace it with this.foo = s.intern()

String.intern() method returns a unique representation for each String object, and thus Foo.foo will be guaranteed to point at no more than one String object for each unique value. However, you should take into account the following:

The call to String.intern() is not free. In particular, it may incur a noticeable CPU overhead in multithreaded applications due to internal synchronization. That is, to avoid storing multiple copies of the same string by different threads, the pool of strings used by this call is synchronized internally. Returning a unique value in all situations is required by the contract of String.intern(), but becomes a burden if all we need is to reduce the number of duplicated strings, not necessarily eliminating each and every of them.
Prior to JDK 8, the internal pool is implemented as a fixed-size hash table with chains of entries, similar to HashMap$Entry, hanging off it. Because of the fixed-size table, a large pool results in long entry chains which take a lot of time to traverse. This may degrade performance of String.intern() severely even for a single-thread application. The author once observed more than three-times slowdown for an application that was interning about 1.5M duplicated strings that corresponded to fewer than 300,000 unique values. The last number would be the actual size of the string pool.
The internal pool of strings can grow as needed, and strings that are not referenced from anywhere in the application are automatically garbage collected from it. However, in HotSpot JVM prior to JDK 8, this pool is kept in a separate heap area called Permanent Generation. If it grows large, you may run out of PerGen space and will need to set a higher value for it using the -XX:MaxPermSize=<size> JVM flag.
Adding a string to the String.intern() pool incurs some memory overhead. As a result, in some situations you may discover that after adding a call to String.intern() the total memory usage decreased only marginally, or even increased. That is more likely to occur if in addition to duplicated strings the application happens to intern many non-duplicated ones.

For these reasons, if the number of different values of duplicate strings (what JOverflow reports as "duplicate values" on the top of section 10) is high, it may make sense to try alternative string intern table implementations. Since a custom table is not required to return a unique value for each and every string, such a table may use very loose synchronization. It may use WeakReferences to allow the GC to collect unused strings, or it can be implemented as a fixed-size, small enough array and rely on the fact that least recently used strings contained in it, including the totally unused ones, will eventually be overwritten with newer ones. The author once wrote and tried several quick-and-dirty custom intern tables, each of them configured to use no more than 1-2 per cent of the heap. For the single-threaded application that they were used with, they all resulted in considerable (about 20 per cent) heap size reduction and some (5-6 per cent) speed up due to reduced GC pressure. In contrast, using String.intern() resulted in similar memory savings but an incredibly high (more than 3 times) application slowdown. However, that was observed with JDK 6; the improved string pool implementation in JDK 8 is supposed to address the main reason of this slowdown, as previously discussed.

Section 13: Duplicate Primitive Array Stats

Duplicated primitive arrays are multiple instances of primitive arrays, such as int[] or byte[] with identical type, length and contents. Note that JOverflow only considers standalone arrays. That is, for instance, char[] arrays that back String instances are not checked for duplication here. The metrics reported in this section are very similar to those for duplicated strings reported in section 8. Assume that a heap dump contains just six separate primitive arrays:

int[] ia1 = new int[]{1,2,3}; int[] ia2 = new int[]{4,5,6}; int[] ia3 = new int[]{1,2,3}; int[] ia4 = new int[]{4,5,6}; int[] ia5 = new int[]{1,2,3}; int[] ia6 = new int[]{7,8,9}; byte[] ba7 = new byte[]{1,2,3};

Total primitive arrays: the total number of primitive array objects of all types, 7 in our example
Unique arrays: the number of separate logical values (contents) among all these arrays. In our example, it will be 4: int[]{1,2,3}; int[]{4,5,6}; int[]{7,8,9}; byte[]{1,2,3};
Duplicated values: the number of separate logical values (contents) among arrays that are duplicated. 2 in our example: int[]{1,2,3}; int[]{4,5,6};
Overhead: the size, in bytes, of redundant primitive arrays. In our example that's 3 arrays: int[]{1,2,3}; int[]{1,2,3}; int[]{4,5,6};

Sections 14 and 15: Number, overhead and nearest fields/reference chains for duplicate primitive arrays

Information in these sections is collected, aggregated and presented in the same way as for duplicated strings reported in sections 9 and 10.

Section 16: WeakHashMaps with references from values to keys

Here JOverflow reports instances of java.util.WeakHashMap which contain key-value pairs (K, V) such that a field in V points back at K (or some other key in the same map). If there is such a reference, K and V will not be removed from the map even when there are no other references to K. This, in effect, breaks the weakness property of the map and creates a memory leak. Note that a weak reference from V back to K is fine - a WeakHashMap will work as intended in this case.

The overhead of a problematic WeakHashMap is calculated as a shallow size of all key and value objects that are linked in the above way. This is a conservative estimate: each of these objects may point to others, that collectively might consume a lot more memory.

Section 17: Data Fields that are always or almost always null/zero

In this section, JOverflow reports classes for which instances have some or all data fields equal to null or zero. More precisely, in the first subsection the tool reports every class where certain field(s) are null in each instance of that class, whereas in the second subsection it reports each class where certain field(s) are null in at least 90 per cent of its instances.

For example, if we have class A { String s; int i; ... }, and in all instances of A field s == null, class A and field s will be reported in the first subsection. If we have class B { String s; int i; ...}, there are 100 instances of B, and in 90 of them i == 0, then B and i will be reported in the second subsection. If a class has no fields at all, like in java.lang.Object, it will be treated as if it has all its fields equal to null, and will be reported in the first subsection, provided its total overhead is above the 0.1 per cent threshold.

The overhead of problematic classes is calculated as follows:

For a class with some fields null in all/most instances, the overhead is the combined size of these fields in bytes multiplied by the number of problematic instances.
For a class with all fields null in all/most instances, the overhead is the whole instance size (including object header and alignment) multiplied by the number of problematic instances.

Fields that are always or almost always null can create considerable overhead. The problem may occur for two main reasons. The field may be not used at all, essentially forgotten, in which case it should be simply removed. More often, though, it happens that a field is defined in a superclass but used only in some of its subclasses, and not used at all in others. In that case, one needs to analyze usage of that field and the class hierarchy carefully. To address the problem, it may be possible to simply move the field to the relevant subclass, which will likely be a more natural place for it. In other cases, one may need to make a deeper restructuring of the class hierarchy, e.g. insert a new class in between the superclass and some of its subclasses, and move the problematic field down into that new intermediate class.

Section 18: Primitive data fields with unused high bytes

In this section, JOverflow reports classes that have fields of type char, short, int or long, and furthermore some of these fields use no more than a half of their bits. For example, an int field whose value in each object is in the Short.MIN_VALUE .. Short.MAX_VALUE or Byte.MIN_VALUE .. Byte.MAX_VALUE range. If by design the given field can never have a value outside this range, such a field can be replaced with short or byte, saving 1/2 or 3/4 of memory used by it.

Similar to null/zero data fields, JOverflow reports two groups of problematic classes. In the first group, the given field(s) don't use their high bytes in every instance of the given class. In the second group of classes, the given field(s) don't use their high bytes in 90 per cent or more of instances of the given class.

The overhead of problematic fields is calculated as the amount of memory that would be saved if the field's type is changed to the smallest one that can still accomodate all the values of that field.