JOverflow is a tool that analyzes Java heap dumps for a number of known memory
usage inefficiencies. This documentation is for the command-line version of JOverflow.
There is currently work in progress on integrating JOverflow with Mission Control.
The resulting interactive tool's output will likely be somewhat different, but its
basic concepts should remain the same.
How to use the tool
Downloading and running the tool
JOverflow takes a JVM heap dump in HPROF format. Files in this format usually
have .hprof extension and are typically generated with the following
command (for HotSpot VM; JRockit may use a different command):
jmap -dump:live,format=b,file=my_heap_dump.hprof <JVM process pid>
JOverflow reads the heap dump, processes the data, and produces a report
summarizing its findings. To run JOverflow:
- Build the Mission Control source code.
- Locate the built jar files for org.openjdk.jmc.common and org.openjdk.jmc.joverflow, and run the tool as
java -cp org.openjdk.jmc.joverflow-{version}.jar:org.openjdk.jmc.common-{version}.jar org.openjdk.jmc.joverflow.Main foo.hprof > foo-joverflow.txt
- For big heap dumps (over 1500M or so), you may want to adjust the JVM settings
in the script. Typically you will need to add
-d64 -Xmx<1.2..1.5 x dump size>.
JOverflow supports the following command line options:
- -depth_first Scan the heap in depth-first order (default). This
is usually faster, but can result in longer and less useful reference
chains leading to problematic objects. See the section on
top memory consumers
for more details.
- -breadth_first Scan the heap in breadh-first order. This is
usually slower, but generally results in shorter and more useful reference
chains for problematic objects, see above.
- -full_obj_histo Print full object histogram (normally only top
memory consumers are printed). See
Class and Object Information for more
details.
- -ref_chain_depth=<n> Print up to n elements in
reverse reference chains leading to problematic objects (default is 8).
See the section on reference chains for
problematic objects for more details.
- -ref_chain_stoppers=<prefix1,prefix2,...> Stop printing
a reference chain when class starting with any of the given prefixes is
reached (default is oracle.apps.). See the section on
top memory consumers
for more details.
- -pointer_size=<size in bytes> Explicitly specify JVM
pointer size to be used in calculations. It generally makes sense to use
this flag only for 64-bit heap dumps; see the next section for details.
An important note about heap dumps
The HPROF file format is platform-independent. Unfortunately, one (probably
unintended) consequence is that it provides very little platform-specific
information that tools like JOverflow or Eclipse Memory Analyzer (MAT) need in
order to accurately calculate how much space is consumed by Java objects in
the real JVM heap. The fundamental numbers that tools need for that are:
- Pointer size - how many bytes an object reference takes
- Object header size. In the heap, each Java object is preceded by a header
that contains a pointer to the object's class, some internal flags, etc.
- Object alignment. In the heap, objects typically cannot be placed at
arbitrary addresses. Starting addresses are at least 8-bytes aligned for
both HotSpot and JRockit JVMs.
A HPROF file contains only the so-called identifier size, loosely corresponding
to the pointer size. It allows the tool to determine whether the JVM ran in
32-bit mode (in that case, identifier size is 4) or in 64-bit mode (size 8).
In 32-bit mode, other numbers are, fortunately, fixed and are same for both
HotSpot and JRockit:
- Pointer size is 4 bytes
- Object header size is 8 bytes. For arrays, additional 4 bytes are used
for array length, effectively resulting in object header size of 12 bytes.
- Object alignment is 8 bytes.
However, in 64-bit mode the situation is much more complex. It stems from the
fact that in this mode the JVM may use narrow (compressed)
references, which means that 32-bit, effectively truncated pointers are used
within a 64-bit heap. The description in the
JVM bug 7145625
accurately summarizes what happens in that case.
The above bug requests the HotSpot developers to provide the required numbers
in the HPROF file, but it is currently unclear when it's going to be addressed.
Thus presently JOverlow does not have a 100 percent reliable way to find out
whether wide or narrow (compressed) references are used, and what the object
alignment is. It is possible to specify pointer size explicitly, using the
-pointer_size flag. Otherwise, for 64-bit heap dumps the tool
currently uses guessing logic that works as follows:
- Initially, pointer size is assumed to be 4 bytes if the file size is less than
26GB, and 8 bytes otherwise. That is, narrow references are assumed if the
heap size (which more or less correlates with the heap dump size) is smaller
than the limit used in HotSpot for switching between narrow and wide
references.
- Object alignment is always assumed to be 8 bytes (same situation as above)
- Object header size is initially assumed to be 12 bytes (plus additional
4 bytes for arrays). However, if during
subsequent heap dump scanning JOverflow comes across any class that belongs
to jrockit.vm.* package, it assumes that the heap dump has
been generated by JRockit, and changes the header size to 8 bytes (plus
4 bytes for arrays). According to JRockit engineers, that VM uses 8-byte
headers on all platforms, so this part of our calculations is accurate,
assuming JRockit was indeed used.
- After all objects have been read from the heap dump, the tool attempts
to determine the precise value of pointer size by looking at pairs of
consecutive objects. We use the fact that object IDs, that is, numbers
by which objects in the heap dump reference each other, are actually
memory addresses of objects at the time when the heap dump was taken.
So, the difference between IDs of two consecutive objects would be
equal to the actual size, in memory, of the first object. The tool
calculates the expected object size using the current pointer size,
and then compares it with the actual object size above. If the numbers are
the same, it's assumed that the current pointer size is correct. If the
numbers are different, the alternative pointer size is used to calculate
the alternative expected object size. If the alternative object size matches
the actual size, it's assumed that the alternative pointer size is
correct. If neither numbers match, the same operation is tried for another
pair of objects, until a certain maximum number of objects is reached. Only
objects that have fields of pointer and int types are used in this
calculation, to avoid uncertainities with size of fields that may take less
than 4 bytes, e.g. boolean or char.
Given this guessing logic, it may well happen that for the same 64-bit heap dump
JOverflow and other tools, e.g. MAT, would report different object sizes and,
consequently, different total heap size. Furthermore, it may well happen that both
tools would be wrong. However, in the past the author has observed that changing
these numbers could affect absolute figures considerably, but, interestingly, would
not affect relative overhead estimates that much. That is, estimated memory usage
and/or absolute overhead due to, say, empty HashMaps may jump from 200MB
to 300MB, but the relative overhead only changes from 8 per cent to 9 per cent.
JOverflow heap report
A JOverflow report is divided into sections, each devoted to one or
more memory problems of the same kind. For each problem, we determine the number
of objects suffering from it, and calculate the overhead: the theoretical amount
of memory that would be saved if the problem was eliminated. The amount of
overhead can be considered a problem rank: the higher it is, the more sense it
makes to fix the given problem.
Section 0: Important Issues Overview
This section contains a summary of the most acute problems detected by JOverflow.
They are reported in detail in other sections referenced from this one. A problem
of a particular kind gets into Section 0 if its overhead is higher than the fixed
relative threshold (for example, 2 per cent for empty or sparse collections).
Threshold values have been chosen based on statistics for a number of real-life
heap dumps.
Section 1: Overall Stats
This section presents a diverse set of data. First, the fundamental values needed
to calculate object sizes and overhead are reported: pointer size, object header
size and object alignment. Next, overall heap stats are reported: the number of Java
objects in the heap and its breakdown into instances, object arrays and primitive
arrays. Finally, we report some "assorted raw stats" :
various metrics that may occasionally signal certain memory usage anomalies,
that have not made their way into more sophisticated analysis yet. For example,
a high (more than 15-20 per cent) overhead due to object headers would be bad,
and most likely would be due to a very large number of small objects in the heap.
Note that when the stats in this section are calculated, the tool does not make a
distinction between objects that are parts of more complex data structures that
JOverflow recognizes (known Java collections and java.lang.String
instances) and all other objects, which we call standalone objects. In
other words, when calculating raw stats, all Java objects are treated in the
same way.
Some notable pieces of assorted raw stats are:
- Size of all *$Entry instances: reports the total size of all objects
whose class name ends with $Entry, for example HashMap$Entry.
Such objects are almost always collection implementation details, and their total
size gives some idea of overhead due to collection internals.
- Number and overhead of short arrays. These metrics are reported, since each
array in a Java heap incurs the overhead of array header and alignment, see
above. The relative value of this overhead for a short array is larger than
for a longer one. A large number of short arrays can become very wasteful.
- Overhead of all null array entries: here we just add up the size in bytes
of all null pointers in object arrays.
- Number and overhead of short Strings. The problem with short Strings is
similar to that with short arrays. The overhead is calculated as a number of
java.lang.String objects with the value of String.length() in the
given range, multiplied by shallow size of String.
- Number and overhead of boxed numbers. Objects such as java.lang.Integer,
java.lang.Character etc. take much more memory than values
(int, char) that they wrap internally. Thus excessive usage
of such objects can be very wasteful. The overhead of boxed number objects is
calculated, for each type, as
(sizeof(Number) - sizeof(number) + pointer_size) * num_of_Number_instances,
where e.g. for Number equal to java.lang.Integer, number
is int. We add pointer_size in the above formula, since it
is expected that for each boxed number instance there is at least one pointer
somewhere, and that takes valuable space as well.
There are two special subsections in this section as well. The first one gives
some overall statistics related to class loaders. The two pieces that may be
somewhat non-obvious are:
- Classloader instances with loaded classes: this is gathered by walking all
the Class objects in the heap dump and gathering references to their
loaders. As a result, we know the number of loaders that do have loaded classes,
and the how many separate types they belong to.
- Classloader instances with only one loaded class: we check how many of the
above-mentioned loader objects are only referenced by one Class object,
and how many types these "single-loaded-class" loaders belong to.
The second subsection gives statistics on Strings that are either compressed
or compressible. Here by "compressible String" we mean one that contains only
ASCII characters (0..255), and therefore its backing char[] array can
potentially be replaced with a byte[] array (or is actually replaced
with one, as done in some JDK versions - that's what lines that mention
byte[] tell about).
One important aspect of gathering data for this section is related to the fact
that in JDK versions prior to JDK 7u6 two or more String objects can use
different parts of the same backing char[] array (or a
single String can use only a subset of its backing array). This
means that there is generally no simple answer to questions like "how many
characters are used by this group of String objects". The same character in
some backing char[] array may be used by several String objects, or
not used at all. This is important, because in a given char[]
array some characters may be ASCII (compressible) and some not. So here we
only look at characters that are actually (logically) contained in
each String. Furthermore, when calculating "used bytes in backing arrays",
we also use only the above-mentioned characters, and we don't count characters
in a backing array if we have already seen that array before. This can lead
to the "number of used bytes in backing arrays" in this section being
considerably smaller than the total number of bytes in String backing arrays
as presented in the Class and Object Information
section.
Section 2: Class and Object Information
In this section, the tool reports the total number of Java classes and Java
objects. Java objects are class instances and arrays. JOverflow also reports how
many classes in the dump have no instances at all, or just one instance. Some
number of classes with no instances is present in any heap dump, since
some classes only contain static fields and methods, instances of many anonymous
classes tend to be short-lived, etc. However, a large number of classes with no
instances may signal a problem, such as machine-generated classes that are
always generated by some library but never used by the application.
Finally, an object histogram is printed for classes with instances consuming at
least 0.1 per cent of the total heap size. Typically such objects collectively
consume almost all of the heap, but represent a small fraction of the total
number of classes. To get a full list of classes, run the tool with
-full_obj_histo flag.
In the object histogram, we report the following numbers for each class: number
of instances, shallow size of instances, and implementation-inclusive size of
instances. The latter number is the same as shallow size for most classes,
except for:
- Known Java collections: container classes in java.util.* and
java.util.concurrent.* packages, and their subclasses, including
user-defined ones. For a collection, implementation-inclusive
size is the size of the collection object itself, plus the size of all
objects that are, logically, a part of this collection's implementation.
For example, for a HashMap that would be the HashMap$Entry[] array
referenced by HashMap.table field, plus the total size of all
HashMap$Entry objects referenced from it. However, the contents
of this map, that is, keys and values, are not included. Since many
collections, such as ArrayList, contain an Object[]
array in their implementation, the implementation-inclusive size for
Object[] arrays in the JOverflow object histogram is smaller
than their shallow size. That's because implementation-inclusive size for
Object[] only includes standalone arrays.
- String instances. An implementation-inclusive size for a
String is the size of the instance plus the size of its backing
char[] array. For this reason, implementation-inclusive size
of char[] arrays is also smaller than their shallow size: it
includes only standalone arrays.
The "paradox" with shallow vs implementaiton-inclusive size for char[],
Object[], HashMap$Entry etc. is easy enough to understand.
However, it turns out that occasionally implementation-inclusive size may be
smaller than shallow size for HashMap and LinkedHashMap!
That happens when a heap dump contains many instances of HashSet or
LinkedHashSet and few instances of HashMap or
LinkedHashMap. An instance of HashSet always points to an
instance of HashMap, same with LinkedHashSet and
LinkedHashMap. Shallow size for HashMap is calculated for
all HashMaps, both standalone and those that are owned by
HashSets. So, when there are many HashSets in the heap dump,
this number may end up pretty big. However, implementation-inclusive size for
HashMap is calculated only for standalone HashMaps. Thus if
there are far fewer HashMaps than HashSets, this number, based
on a small number of standalone HashMaps, may end up relatively small.
Sections 3,4: Number, size and nearest fields/reference chains for top
memory consumers
Top memory consumers are classes whose instances consume the highest amount of
memory. In this section, JOverflow provides information on data structures that
hold these objects in memory.
From the class histogram, the tool determines classes whose instances consume the
highest amount of memory. Next, it performs a depth-first or breadth-first scan of
the heap dump, starting from GC roots. Whenever an instance of one of the above classes
is detected, a whole reference chain to it is recorded. It is important that within
a reference chain, we don't distinguish individual array elements or collection
elements. So, for example, if object A are reachable via the chain looking like
GC_root_1 -> Foo.bar[0], and object B is reachable via chain
GC_root_1 -> Foo.bar[1], JOverflow will record the same reference
chain for them, that will look like GC_root_1 -> Foo.bar -> Object[].
Furthermore, when multiple objects get reached via the same reference chain,
the information for these objects is aggregated. In the end, we know how many
objects of given types (for example, 100 instances of HashMap and 200
instances of LinkedHashMap) are reachable via the given reference chain,
and how much memory they use. A group of objects reachable via the same reference
chain is called an object cluster.
Note that breadth-first scan (you can choose it via the command-line option) is
usually slower than depth-first, but generally results in shorter and more
useful reference chains. That's because, by design, a reference chain (at
least from the given GC root) via which we get to the given object in this way
is the shortest possible one. The shortest reference chain is what people
usually use as a "primary" one and perceive as the most "logical" one. If the
same object is reachable via two reference chains, the longer one tends to
include various secondary data structures, WeakHashMaps and
so on. If there is a group of objects reachable via both short (good) and long
reference chain, depth-first scan may report half of them under the first
chain, and another half under the second. As a result, it may be less clear
where the biggest groups of problematic objects are.
Section 3 presents the above information aggregated one level further: for each
object cluster we take only the data field that references the cluster directly,
and then merge all clusters with the same nearest referencing data field. This
representation is more compact, results in fewer clusters, and is generally more
useful. That's because when the user tries to figure out how to fix a
particular problem, in many cases they need to look no further than the code
that creates and directly manages a particular object, which is in
the same class as the data field referencing that object.
Section 4 presents reference chains described in the first paragraph, printed
in reverse order - from the object cluster to the GC root. Intermediate
arrays, collections and custom linked lists are printed in the condensed form. That is,
we avoid printing internals of e.g. a HashMap, and thus a part of the
chain that in reality looks like HashMap.Entry.key <- HashMap.table[]
<- Foo.myMap would be represented as {HashMap} <- Foo.myMap.
Reference chains may still be very long, so to save space they are truncated.
By default, that happens either after the 8th element or after the first
element that belongs to an oracle.apps.* class. There are two
command line flags, -ref_chain_depth=<n> and
-ref_chain_stoppers=<prefix1,prefix2,...> that can be used
to override these values.
Section 5: Problematic Collections
In this section, known collections that suffer from problems recognized by
JOverflow are reported. For each problem, types of relevant collections, number
of collection instances and total overhead is reported.
Known collections are defined as container
classes in java.util.* and java.util.concurrent.* packages
and their subclasses. JOverflow currently recognizes the following problems
with them (some ways of addressing these problems are suggested in
the separate section of this document):
- Empty unused: a collection is empty if it contains no elements. The
overhead of an empty collection is defined as the amount of memory that would
be saved if the collection was not allocated at all - that is, the
implementation-inclusive size in bytes of the collection (see previous section).
Empty collections where capacity may exceed size - typically
array-backed collections, such as HashMap and ArrayList -
may incur surprisingly high overhead, especially if the initial capacity is
higher than the default value. For a ConcurrentHashMap
in JDK releases prior to 7, the size of an empty instance initialized with
default values is whopping 1200 bytes. Fortunately, in more recent JDK versions
its code has been modified to create internal ConcurrentHashMap.Segment
objects lazily, which greatly reduced this overhead.
Empty unused are empty collections that furthermore have never been used,
i.e. never contained any elements. This is determined using long or int
modCount field that is defined in most collections and is incremented
every time a collection is updated.
- Empty used are empty collections that previously contained some
elements. That is assumed to be the case if modCount field (see the
previous item) is not zero.
- Empty are empty collections for which we cannot determine whether
they previously contained any elements, because they don't have the
modCount field.
- Small Sparse: a collection is sparse if the number of elements in it
is fewer than a half of its capacity. Consequently, only collections where
capacity may exceed size - typically array-backed collections - can be sparse.
The overhead is defined as the total size in bytes of empty slots (null pointers)
in the collection - that is, how much memory would be saved if the collection did
not contain any empty slots. A collection is furthermore small sparse
if its capacity does not exceed the default capacity (e.g. 16 for HashMap).
See the discussion of fixing problems in collections
on why the distinction between small and large (see below) sparse collections
is important.
- Large Sparse: these are collections that are sparse (see above) and
have a capacity larger than default. See the
discussion of fixing problems in collections
for further details.
- Boxed: these are collections that contain boxed numbers, i.e.
instances of classes such as java.lang.Integer, java.lang.Double
etc. Such collections are considered problematic since instances of boxed numbers
take a lot more memory than int, double etc. numbers that they
wrap. The overhead of a boxed collection is defined as the amount of memory that
would be saved if we replaced the collection with one (for lists) or two (for
maps) arrays of the corresponding primitive numbers. Technically, its value is
calculated as
collection_impl_size + (sizeof(Number) + pointer_size - sizeof(number)) * num_elements.
Here Number stands for java.lang.Integer, java.lang.Double
etc., and number stands for int, double, etc.
Note that currently a collection is considered boxed if a random
sample of a key or value or both returns a boxed number object.
- Vertical bar: this is applicable only to collections that are similar
to two-dimensional arrays, that is, lists of lists or lists of arrays. A
"vertical bar" is a collection where outer dimension M is (much) larger
than the inner dimension N. Such a collection is wasteful, since each short inner
array or list incurs a per-object overhead (list or array implementation size)
and requires a pointer from the outer list. The overhead of a bar collection is
defined as the amount of memory that would be saved if outer and inner dimensions
are "flipped", i.e. we replace an M by N list with N by M one.
Technically, it is calculated as (M - N) * (pointer_size + array_or_list_overhead)
- Small: these are collections that contain a very small number of elements
(currently defined as 4 or fewer). Such collections are considered problematic,
since for them the "fixed costs" (size of implementation details that don't
depend on the number of elements in collection) are comparable to, or higher
(sometimes much higher) than the total size of "workload" in that collection.
For example, the minimum amount of memory used by a HashSet, which
internally consists of several Java objects with multiple data fields each, is around
100 bytes on a 32-bit JVM. If a HashSet contains just 3 elements, its
workload size is 12 bytes - an order of magnitude smaller than its implementation
size. The overhead of a small collection is defined as the amount of memory that
would be saved if the collection is replaced with one (for lists) or two (for
maps) arrays of objects containing this collection's elements. Technically,
it is calculated as
collection_impl_size - (pointer_size * num_elements + array_overhead)
(for lists) or
collection_impl_size - 2 * (pointer_size * num_elements + array_overhead)
(for maps).
Section 6: Problematic Standalone Object Arrays
This section is similar to the previous one, but covers standalone object
arrays - that is, arrays of Java objects that are not a part of the implementation
of any collection known to JOverflow. The following problems are recognized:
- Length 0: arrays that have length zero, i.e. cannot contain any
elements. Such arrays might be useful in very few situations; usually they are
a useless by-product of methods that blindly allocate arrays with specified
length. The overhead of such an array is defined as its size in bytes, which
is array header plus possible object alignment.
- Length 1: arrays that have length 1. An array of length 1 is
equivalent to a direct reference to the object that it contains - in other words,
the whole array is unnecessary. Thus its overhead is defined as its size in bytes.
- Empty: an object array is empty if it contains only null elements.
The overhead is defined as the array size in bytes.
- Sparse: an object array is sparse if the number of non-zero elements
in it is fewer than a half of its length. The overhead is defined as the total
size in bytes of empty slots (null pointers) in this array - that is, how much
memory would be saved if there were no empty slots.
- Boxed: these are arrays that contain boxed numbers, e.g.
java.lang.Integer. See the description for boxed collections above.
The overhead of a boxed array is calculated as
(sizeof(Number) - sizeof(number)) * num_unique_referenced_Number_objects +
pointer_size * num_references_to_Number. That is, we take care of a
situation when multiple elements of a boxed array reference the same
boxed number object. That would occur if, for example, the majority of
array's elements are instances of java.lang.Integer with small
values, for which Integer.valueOf() method returns unique cached
objects. Note than an array is considered boxed even if it contains just
one boxed number object, but the overhead strictly depends on the number
of such objects.
- Vertical bar: this is applicable only to two-dimensional arrays,
and is very similar to vertical bar collections described in the previous
section (in fact, long arrays of short lists would also fall into this category).
The overhead is defined in the same way, as the amount of memory that would
be saved if array dimensions were "flipped".
Section 7: Problematic Standalone Primitive Arrays
This section is similar to the previous two, but covers standalone primitive
arrays, that is, arrays of primitive types, such as int or
char, that are not a part of any collection known to JOverflow,
and not backing char[] arrays of java.lang.String
instances.
- Length 0: this is defined and handled absolutely the same as
zero-size object arrays.
- Length 1: arrays that have length 1. A primitive array of length 1
is equivalent to single field with the value that this array contains. Thus the
overhead of such an array is defined as its size in bytes plus pointer size
minus the size of the contained value (e.g. 2 bytes for a char).
- Empty: these are arrays that contain only zeros. The overhead is
defined as the array size in bytes.
- Long zero-tail (LZT): these are arrays that end with a continuous
block of zeros, that is longer than a half of the array's length. Arrays
looking like this often happen to be buffers in classes such as I/O streams,
text parsers, etc., that have been allocated with excessive capacity.
The overhead for such an array is defined as the size in bytes of the block
of zeros in the end of the array.
- Unused high bytes: these are arrays of type char[], short[],
int[], long[], where each element uses no more than a
half of its bits. For example, an int[] array where each element
is in the Short.MIN_VALUE .. Short.MAX_VALUE or
Byte.MIN_VALUE .. Byte.MAX_VALUE range. If by design an element that
is out of this range can never be added to this array, such an array can be
replaced with short[] or byte[], saving 1/2 or 3/4 of
memory used by it.
Sections 8 and 9: Number, overhead and nearest fields/reference chains for
problematic collections and arrays
These sections give information on data structures that hold problematic
collection and arrays in memory. It is gathered and presented in the same way
as the information on top memory consumers in sections 3 and 4. The only
differences are that the tool reports memory overhead rather than
memory usage for problematic objects. Furthermore, as we know from the
previous section, many different kinds of problems can occur with collections
and arrays, and sometimes the same object can have several problems. Thus the
tool reports both problem kinds and associated overheads for each object
cluster.
It may happen that in addition to problematic objects in some cluster, there is
also some number of "good" collections or arrays reachable via the same
reference chain. If so, the number of good objects is printed after all the
information on the bad ones in the given cluster, both in section 6 and 7.
Knowing the number of good objects is important when choosing the way to fix
a problem with bad ones, as discussed in the next section.
Fixing known problems with collections and arrays
In many cases, the best fix to a performance problem stems from deep understanding
of the code, and architectural changes that completely eliminate problematic
objects and/or result in more appropriate data structures. The tips presented
below assume that either the application is already considered well-optimized,
or only limited, local changes can be made to it for whatever reason.
Empty collections and arrays. A radical way to avoid the
overhead of empty collections is to avoid allocating them until the first use
(lazy initialization). If a collection is referenced by a data field that's
accessed by many methods, switching to lazy initialization would require changing
the code of all these methods, so that they check if the field is null and
initialize it if so. That would make the code less readable and more brittle;
the alternative is to replace all these direct uses with calls to a single
accessor method that deals with lazy initialization. Still such changes may be
time-consuming, and in some cases may create races in object initialization if
these methods can be called by multiple threads.
For this reason, in some situations it makes sense to leave eager collection
initialization in place, but change its initial capacity to a smaller number,
e.g. use new HashMap(8) instead of new HashMap() (the latter
is equivalent to new HashMap(16)). This will reduce the overhead for
those HashMaps that stay empty, but may result in some performance
overhead due to more frequent collection resizing if many HashMaps
allocated at the same site end up growing bigger than their initial capacity.
Here, as well as in other cases described below, the information on the number
of good collections referenced by the same field, that JOverflow provides, may
very helpful. If most collections are bad, the overhead due to the few good
ones, that will be negatively affected by the fix, is likely to be negligible.
However, if a considerable number of collections allocated at the given point
in the code are good, the fix will have to be more sophisticated.
Length 0 arrays are useless in most cases, and should be avoided
whenever possible. However, one common case where this cannot be done easily is
when some array (a data field, or a value returned by some method) is handled
in many places in the code, and changing them all to check for null is
difficult. In that case, a reasonable alternative is to create an empty array
of the appropriate type once, save it in a special static field, and then
assign or return this singleton instance every time a zero size is passed in.
Length 1 arrays are wasteful, because such an array can, in principle, be
replaced with either a direct reference to the single element that it contains,
or a primitive value that it contains. However, that cannot always be done, e.g.
Bar[] Foo.bar may point to arrays of length 1 in 98 per cent of cases,
but to long arrays in the remaining cases. In that case, it may make sense to
use change the type of Foo.bar from Bar[] to Object
and then make it point either to a single instance of Bar or an array.
The code that deals with bar will likely look more complex than before
because of instanceof checks, but it may be justified by the resulting
memory savings.
Sparse collections and arrays: their overhead can be reduced by
changing the initial capacity (but see the warning above regarding possible
frequent resizing as a result). The task is easier if we know there are no
or very few "good" collections in the same cluster. Small sparse collections
are usually easier to fix. Such a collection is typically a result of creating
a standard collection with the default constructor, and then adding just a few
elements to it. To reduce the overhead, it is sufficient to hardcode a more
appropriate (smaller than default) initial capacity parameter to the
constructor. Keep in mind that one of the most popular collections,
java.util.HashMap, with default capacity 16, can by design only have
a power of two capacity. Thus, for example, invoking new HashMap(10)
will still produce a HashMap instance with capacity 16.
Collections that are large yet sparse may be a result of specifying too large
an initial capacity, removing many elements from the collection (most
array-based collections don't "shrink" in that situation), or, in some
cases, be a side-effect of automatic collection growth. For example, the
capacity of a HashMap is increased twice when the number of
elements in it grows larger than 3/4 (default load factor) of the
current capacity. Thus in the updated HashMap, only 3/8 of slots are
occupied, making it sparse.
Fixing large sparse collections is not always easy, since one needs to determine
the root cause of the problem. For sparse HashMaps with appropriate
initial capacity and no element removal, it is possible to reduce sparseness
by specifying, in the constructor, a load factor that is higher than the
default (0.75). However, this measure may affect CPU performance (any hash table
that is nearly full experiences a larger number of hash collisions), and thus
should be used with great care.
Boxed collections can both create a considerable memory overhead and
increase GC pressure due to the large number of small objects that they contain.
Fortunately, in most cases what the user needs to store in such a collection
are actual numbers, rather than instances of java.lang.Integer etc.
Thus it is best to replace problematic boxed collections with specialized
ones, that store primitive numbers. Apache Commons library provides such
collections, but may or may not be allowed to use; GNU Trove is another
option, but much less likely to be allowed due to licensing issues. In many
cases, e.g. when deleting elements from the collection is not required, it
is easy to write a custom implementation.
Vertical bar collections and arrays. As already mentioned, the easiest
way to reduce overhead of such objects is to "flip" their dimensions.
Furthermore, one can generally convert a two-dimensional array into a
single-dimensional one (albeit at the cost of making code somewhat more complex
and less readable) if, for example, there are many such arrays in the
heap and the overhead of sub-arrays is considerable.
Small collections. To save maximum amount of memory, a small collection
should be replaced with an array of objects. Of course, in reality it is not
always easy to do that. Perhaps the simplest case is when all collections
referenced by field Foo.c are of the same size, initialized in a
constructor and are not updated afterwards. If replacing with an array is
difficult, but at least it's known that any collection to which Foo.c
can ever point is small, it may make sense to think about an alternative,
more economical collection class. For example, one may replace a small
HashSet with an ArrayList and use indexOf()
method of the latter instead of HashSet.contains() without noticeable
loss of performance. Similarly, for a small number of elements a
HashMap with manual synchronization is likely to perform no worse,
or even better than a ConcurrentHashMap.
Long zero-tail primitive arrays are largely equivalent to sparse object
arrays, and the same advice about choosing right initial capacity applies to
them.
Primitive arrays with unused high bytes should, ideally, be replaced
with arrays of more appropriate type, e.g. int[] with short[].
Section 10: Duplicate String Stats
Duplicate strings are multiple instances of java.lang.String with the
same value. For example, when Foo.foo.equals("xyz") , Foo.bar.equals("xyz") ,
Foo.foo != Foo.bar there is duplicated string "xyz" and two
String instances with this logical value referenced by data fields
Foo.foo and Foo.bar.
Every instance of java.lang.String points to a backing char[]
array with the actual string contents. Therefore string duplication can take two
forms: two or more String instances with the same logical value can be
backed by different char[] arrays with identical contents, or by the
same char[] array. In the first case, all but one String
objects and their backing arrays are considered to be overhead by our tool. In
the second case, only the redundant String objects are overhead.
Most Java applications generate some number of duplicate strings. Sometimes
only 20-30 per cent of all String instances are unique, and the
overhead of duplicate strings can be as high as 20-25 per cent of the total
heap size. However, usually there are quite a lot of different string values
for duplicate strings, and very rarely copies of a single string take more
than one per cent of the heap. In section 8, JOverflow gives a list of individual
strings causing overhead of 0.1 per cent of the heap or more. Usually this list
is pretty short. For the vast majority of duplicate strings, the contribution
of each individual string is much smaller. Thus it makes sense to consider such
strings only when they are aggregated with others based on a common reference
chain and/or the nearest referencing data field.
The exact meaning of the metrics reported in this section is explained below.
For illustration, we assume that a heap dump contains just six separate
String objects:
s1 = "foo"; s2 = "bar"; s3 = "foo";
s4 = "bar"; s5 = "abc"; s6 = "xyz";
Furthermore, s1 and s3 are backed by the same char[]
array, whereas s2 and s4 are backed by two different arrays
with identical contents.
- Total strings: the total number of java.lang.String
instances, 6 in our example
- Unique strings: the number of separate logical values among all strings.
In our example, it will be 4: "foo", "bar",
"abc", "xyz"
- Duplicate values: the number of separate logical values among strings
that are duplicated. 2 in our example: "foo", "bar"
- Overhead: the size, in bytes, of redundant String
objects and backing arrays (in our example that's two String objects,
s2 and s4, and one char array, "bar").
Additionally, for each of the top duplicate strings the following numbers are
reported:
- Overhead: calculated as specified above
- Number of char arrays: how many backing char arrays exist for this
string value
- Number of objects: how many String instances with this value exist
- Maximum array length: the length of the longest backing char array.
This number may be greater than the corresponding string length due to the fact
that prior to JDK7u6 a String (or more likely several String instances) can
be backed by a char array that is longer than the String itself (that is, the
String uses only a fragment of that array).
- Value: string value; can be truncated if too long.
Sections 11 and 12: Number, overhead and nearest fields/reference chains for duplicate strings
Information in these sections is collected and aggregated in exactly the same
way as for problematic collections reported in sections 8 and 9. The notable
highligts are:
- For each cluster, the values of several top duplicated strings are printed.
- A cluster of duplicated strings may include more than one unique string
value - that is, four String objects with values "a", "a", "b",
"b", reached via the same reference chain, are aggregated into the same
cluster.
- For each cluster, JOverflow presents the total number of duplicated strings,
the number of duplicated backing arrays, and, finally, the number of
non-duplicated strings (those with logical value unique in this heap dump)
reached via the same reference chain. This information is important for deciding
how to get rid of the particular group of duplicated strings, as discussed in the
next section.
- If some string is listed as a duplicate in the given cluster, its other
copies may belong to other cluster(s) as well. In other words, the fact that a
string is duplicated is treated as its global property, not a local one within
the given cluster.
- When calculating overhead for a particular string cluster, JOverflow
considers only the String objects within this cluster and their backing
char[] arrays. Furthermore, the same backing array is never used in
calculations more than once.
Getting rid of duplicate strings
As with problematic collections, better overall application architecture often
leads to less redundancy, fewer objects, and thus fewer duplicated objects,
including strings.
However, some operations, e.g. parsing a text document, inevitably produce
duplicated strings. Depending on how these strings are handled subsequently,
it may be possible to come up with efficient case-specific duplication
elimination methods. This is, however, beyond the scope of this document.
Below we suggest one universal technique for local elimination or reducing
the number of duplicated strings. This technique often works efficiently, but
sometimes it may cause considerable overhead of its own, both in terms of memory
and CPU cycles. Thus, as any optimization, it should be fully understood and
used with care.
The technique is based on string interning and works as follows. Using
JOverflow output, one can determine that some data field Foo.foo points
to a sufficiently large cluster of duplicated strings. Provided that the number
of duplicated strings in this cluster is considerably (ten times or more) higher
than the number of non-duplicated strings reported for the same cluster (that's
why this number is important!) the following can be done:
- Find out where Foo.foo is written. The simplest situation is
when it is a private field that is initialized in the constructor and never
changed afterwards. The worst case is if it's a public field and can be written
anywhere in the code.
- Add a call to String.intern() method at every line where
a value is stored into Foo.foo. For example, if there is a line in
the constructor like this.foo = s; replace it with
this.foo = s.intern()
String.intern() method returns a unique representation for each
String object, and thus Foo.foo will be guaranteed to point at
no more than one String object for each unique value. However, you should
take into account the following:
- The call to String.intern() is not free. In particular, it may
incur a noticeable CPU overhead in multithreaded applications due to internal
synchronization. That is, to avoid storing multiple copies of the same string by
different threads, the pool of strings used by this call is synchronized
internally. Returning a unique value in all situations is required by the
contract of String.intern(), but becomes a burden if all we need is
to reduce the number of duplicated strings, not necessarily eliminating each and
every of them.
- Prior to JDK 8, the internal pool is implemented as a fixed-size hash table
with chains of entries, similar to HashMap$Entry, hanging off it. Because
of the fixed-size table, a large pool results in long entry chains
which take a lot of time to traverse. This may degrade performance of
String.intern() severely even for a single-thread application. The author
once observed more than three-times slowdown for an application that was interning
about 1.5M duplicated strings that corresponded to fewer than 300,000 unique
values. The last number would be the actual size of the string pool.
- The internal pool of strings can grow as needed, and strings
that are not referenced from anywhere in the application are automatically
garbage collected from it. However, in HotSpot JVM prior to JDK 8, this pool is
kept in a separate heap area called Permanent Generation. If it grows large, you
may run out of PerGen space and will need to set a higher value for it using the
-XX:MaxPermSize=<size> JVM flag.
- Adding a string to the String.intern() pool incurs some memory
overhead. As a result, in some situations you may discover that after adding
a call to String.intern() the total memory usage decreased only
marginally, or even increased. That is more likely to occur if in addition
to duplicated strings the application happens to intern many non-duplicated
ones.
For these reasons, if the number of different values of duplicate strings
(what JOverflow reports as "duplicate values" on the top of section
10) is high, it may make sense to try alternative string intern table
implementations. Since a custom table is not required to return a unique value
for each and every string, such a table may use very loose synchronization. It
may use WeakReferences to allow the GC to collect unused strings, or it
can be implemented as a fixed-size, small enough array and rely on the fact
that least recently used strings contained in it, including the totally unused
ones, will eventually be overwritten with newer ones. The author once wrote and
tried several quick-and-dirty custom intern tables, each of them configured to
use no more than 1-2 per cent of the heap. For the single-threaded application
that they were used with, they all resulted in considerable (about 20 per cent)
heap size reduction and some (5-6 per cent) speed up due to reduced GC pressure.
In contrast, using String.intern() resulted in similar memory savings
but an incredibly high (more than 3 times) application slowdown. However, that
was observed with JDK 6; the improved string pool implementation in JDK 8 is
supposed to address the main reason of this slowdown, as previously discussed.
Section 13: Duplicate Primitive Array Stats
Duplicated primitive arrays are multiple instances of primitive arrays,
such as int[] or byte[] with identical type, length and contents.
Note that JOverflow only considers standalone arrays. That is, for instance,
char[] arrays that back String instances are not checked
for duplication here.
The metrics reported in this section are very similar to those for duplicated
strings reported in section 8. Assume that a heap dump contains just six separate
primitive arrays:
int[] ia1 = new int[]{1,2,3}; int[] ia2 = new int[]{4,5,6};
int[] ia3 = new int[]{1,2,3}; int[] ia4 = new int[]{4,5,6};
int[] ia5 = new int[]{1,2,3};
int[] ia6 = new int[]{7,8,9}; byte[] ba7 = new byte[]{1,2,3};
- Total primitive arrays: the total number of primitive array objects of
all types, 7 in our example
- Unique arrays: the number of separate logical values (contents) among
all these arrays. In our example, it will be 4:
int[]{1,2,3}; int[]{4,5,6}; int[]{7,8,9}; byte[]{1,2,3};
- Duplicated values: the number of separate logical values (contents)
among arrays that are duplicated. 2 in our example: int[]{1,2,3}; int[]{4,5,6};
- Overhead: the size, in bytes, of redundant primitive arrays.
In our example that's 3 arrays: int[]{1,2,3}; int[]{1,2,3}; int[]{4,5,6};
Sections 14 and 15: Number, overhead and nearest fields/reference chains for duplicate
primitive arrays
Information in these sections is collected, aggregated and presented in the
same way as for duplicated strings reported in sections 9 and 10.
Section 16: WeakHashMaps with references from values to keys
Here JOverflow reports instances of java.util.WeakHashMap which contain
key-value pairs (K, V) such that a field in V points
back at K (or some other key in the same map). If there is such a
reference, K and V will not be removed from the map even when
there are no other references to K. This, in effect, breaks the
weakness property of the map and creates a memory leak. Note that a weak
reference from V back to K is fine - a WeakHashMap
will work as intended in this case.
The overhead of a problematic WeakHashMap is calculated as a shallow
size of all key and value objects that are linked in the above way. This is a
conservative estimate: each of these objects may point to others, that
collectively might consume a lot more memory.
Section 17: Data Fields that are always or almost always null/zero
In this section, JOverflow reports classes for which instances have
some or all data fields equal to null or zero. More precisely, in the first
subsection the tool reports every class where certain field(s) are null in
each instance of that class, whereas in the second subsection it reports
each class where certain field(s) are null in at least 90 per cent
of its instances.
For example, if we have
class A { String s; int i; ... }, and in all instances of A
field s == null, class A and field s will be reported
in the first subsection. If we have
class B { String s; int i; ...}, there are 100 instances of
B, and in 90 of them i == 0, then B and i
will be reported in the second subsection. If a class has no fields at all,
like in java.lang.Object, it will be treated as if it has all its
fields equal to null, and will be reported in the first subsection,
provided its total overhead is above the 0.1 per cent threshold.
The overhead of problematic classes is calculated as follows:
- For a class with some fields null in all/most instances, the
overhead is the combined size of these fields in bytes multiplied by the
number of problematic instances.
- For a class with all fields null in all/most instances, the
overhead is the whole instance size (including object header and alignment)
multiplied by the number of problematic instances.
Fields that are always or almost always null can create considerable overhead.
The problem may occur for two main reasons. The field may be not used at all,
essentially forgotten, in which case it should be simply removed. More often,
though, it happens that a field is defined in a superclass but used only in some
of its subclasses, and not used at all in others. In that case, one needs to
analyze usage of that field and the class hierarchy carefully. To address the
problem, it may be possible to simply move the field
to the relevant subclass, which will likely be a more natural place for it.
In other cases, one may need to make a deeper restructuring of the class
hierarchy, e.g. insert a new class in between the superclass and some of
its subclasses, and move the problematic field down into that new
intermediate class.
Section 18: Primitive data fields with unused high bytes
In this section, JOverflow reports classes that have fields of type
char, short, int or long, and furthermore some of these
fields use no more than a half of their bits. For example, an int
field whose value in each object is in the Short.MIN_VALUE ..
Short.MAX_VALUE or Byte.MIN_VALUE .. Byte.MAX_VALUE range.
If by design the given field can never have a value outside this range,
such a field can be replaced with short or byte, saving
1/2 or 3/4 of memory used by it.
Similar to null/zero data fields, JOverflow reports two groups of problematic
classes. In the first group, the given field(s) don't use their high bytes
in every instance of the given class. In the second group of classes, the
given field(s) don't use their high bytes in 90 per cent or more of instances
of the given class.
The overhead of problematic fields is calculated as the amount of memory that
would be saved if the field's type is changed to the smallest one that can
still accomodate all the values of that field.