Leaking Memory in Java

Jeroen van Erp

Don't we all remember the days when we programmed C or C++? You had to use new and delete to explicitly create and remove objects. Sometimes you even had to malloc() an amount of memory. With all these constructs you had to take special care that you cleaned up afterwards, else you were leaking memory.

Now however, in the days of Java, most people aren't that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all normal cases. But sometimes, the Garbage Collector can't clean up, because you still have a reference, even though you didn't know that.

I stumbled across this small program while reading JavaPedia, which clearly shows that Java is also capable of inadvertent memory leaks.

public class TestGC {
  private String large = new String(new char[100000]);

  public String getSubString() {
    return this.large.substring(0,2);
  }

  public static void main(String[] args) {
    ArrayList subStrings = new ArrayList();
    for (int i = 0; i < 1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

Now, if you run this, you'll see that it crashes with something like the following stacktrace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.(String.java:174)
at TestGC.(TestGC.java:4)
at TestGC.main(TestGC.java:13)

Why does this happen? We should only be storing 1,000,000 Strings of length 2 right? That would amount to about 40Mb, which should fit in the PermGen space easily. So what happened here? Let's have a look at the substring method in the String class.

public class String {
  // Package private constructor which shares value array for speed.
  String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
  }

  public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
      throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
      throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
      throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
      new String(offset + beginIndex, endIndex - beginIndex, value);
  }

We see that the substring call creates a new String using the given package protected constructor. And the one liner comment immediately shows what the problem is. The character array is shared with the large string. So instead of storing very small substrings, we were storing the large string every time, but with a different offset and length.

This problem extends to other operations, like String.split() and . The problem can be easily avoided by adapting the program as follows:

public class TestGC {
  private String large = new String(new char[100000]);

  public String getSubString() {
    return new String(this.large.substring(0,2)); // <-- fixes leak!
  }

  public static void main(String[] args) {
    ArrayList subStrings = new ArrayList();
    for (int i = 0; i < 1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

I have many times heard, and also shared this opinion that the String copy constructor is useless and causes problems with not interning Strings. But in this case, it seems to have a right of existence, as it effectively trims the character array, and keeps us from keeping a reference to the very large String.

Comments (11)

  1. [...] Devlib wrote an interesting post today!.Here’s a quick excerptNow however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all … [...]

  2. Sherif Mansour - Reply

    October 4, 2007 at 2:04 pm

    Hi There,
    Thanks for the insightful article! I found this quite useful - especially in understanding why Java OutOfMemory's work...
    Sherif

  3. Jos Hirth - Reply

    October 5, 2007 at 1:30 am

    Well, that's not a memory leak. See:
    http://en.wikipedia.org/wiki/Memory_leak

    The behavior is intentional - it trades memory for performance. As most things in the standard library (eg collections) it's optimized for general usage and, well, generally it's alright. But you certainly shouldn't tokenize a really big string this way.

    The classic type of memory leaks doesn't exist in managed languages. The only thing we can produce are so called reference leaks. That is... referencing stuff (and thus preventing em from being GCed) for longer as necessary (or for all eternity).

    Fortunately it's easy to avoid - for the most part.

    The important things to know:

    Locally defined objects can be GCed as soon as there are no more no more references to it. Typically it's the end of the block they are defined in (if you don't store the reference anywhere). If you do store references, be sure to remove em if you don't need em anymore.

    If you overwrite a reference with a new object, the object is first created and /then/ the reference is overwritten, which means the object can be only GCed /after/ the new object has been created.

    Usually this doesn't matter. However, if you want to overwrite an object which is so big that it only fits once into the memory, you'll need to null the reference before creating/assigning the new instance.

    Eg:
    //FatObject fits only once into memory
    FatObject fatty;
    fatty=new FatObject();
    fatty=new FatObject();

    Will bomb with OOME. Whereas...

    FatObject fatty;
    fatty=new FatObject();
    fatty=null;
    fatty=new FatObject();

    Will be fine, because the second creation of the FatObject will trigger a full GC and the GC will be able to clear enough memory (since the old reference has been nulled).

    Well, that rarely matters, but it's good to know.

  4. [...] Jos Hirth wrote this in response to this post by Jeroen van Erp. [...]

  5. links for 2007-10-06 - smalls blogger - Reply

    October 6, 2007 at 2:45 am

    [...] Xebia Blog Leaking Memory in Java (tags: java memoryleak programming jvm) [...]

  6. James McInosh - Reply

    October 9, 2007 at 11:07 pm

    I don't know which version of the JVM you are sunning but when it constructs a new string using this constructor:

    String(char value[], int offset, int count)

    It sets the value using this:

    this.value = Arrays.copyOfRange(value, offset, offset+count);

  7. creyle - Reply

    October 10, 2007 at 1:48 am

    To be more obvious, with the underlying big char array being referenced, all the TestGC objects created in the big for-loop could not be GCed. that's the problem.

    Thanks

  8. Jeroen van Erp - Reply

    October 10, 2007 at 8:36 am

    James,

    True for String(char[] value, int offset, int count), but not for String(int offset, int count, char[] value). The constructor you mention is a public constructor. The constructor that is called from the substring method is a package private constructor.

  9. Ryan - Reply

    January 31, 2008 at 8:37 am

    This is not a memory leak. As Jos Hirth said, this is trading memory for speed.

    As soon as you remove the substrings from the ArrayList all the memory that has been allocated for it will be freed. No memory leak there.

  10. Chris - Reply

    July 8, 2008 at 9:27 pm

    I'm leaving this for those who google to find...

    To those saying its not a memory leak, you are being very strict with the term. I, and others I know, have spent many man months of effort tracking down leaks due to this "general" use case.

    Well, unfortunately, its not very general. The problem is that any substring used (or split string used) will keep the whole block. When splitting up large JMS text messages (for example) this will leave the entire message in memory, for an unspecified time.

    It is a real problem that for a general algorithm you will get a leak like effect but not be warned of it in the javadocs.

    User Beware.

  11. […] from here and […]

Add a Comment