简体   繁体   中英

JVM space complexity details: Singly Linked Lists vs Doubly Linked Lists

I have come across a curious situation where students (I'm a teaching assistant in this context) have to implement their own version of a Singly Linked List (SLL) and compare it empirically with the Java standard library implementation of a Doubly Linked List.

And that's where it gets weird: I've seen multiple students note that the DLL profiles at roughly 0.5% extra space utilization compared to an SLL containing an equal number of elements of the same type. All the while basic analysis of the data structures tells me that a SLL has 2 references per node (1 to the next element, and 1 to the contained value) whereas a DLL has 3 (an additional one reference to the previous element). In other words, that is 50% more space usage per node (disregarding the size of the contained value).

The contained values are mostly Integer value objects so I don't think the size of the contained values matters too much here.

What causes this 2 orders of magnitude difference? I'm not entirely sure that "JVM/collections libraries optimization" can cover the whole difference; otherwise it'd have to be one hell of a JVM/java std lib optimization.

The space used should be the same on the Oracle JVM/OpenJDK for a 64-bit JVM with 32-bit references (Compressed oops)

For a Node with two references

header: 12 bytes
two references: 8 bytes
alignment padding: 4 bytes

total is 24 bytes per node as all objects are aligned by 8 byte offsets by default.

For a Node with three references

header: 12 bytes
three references: 12 bytes
alignment padding: 0 bytes

the total is 24 bytes again.

The real question is why did you see any difference at all. This is most likely due to inaccurate memory accounting.

The JVM uses a TLAB (Thread Local Allocation Buffer) This allows the threads in a JVM to grab chunks of memory and allocate concurrently from those chunks. On the downside, you only see how much memory is used from the common Eden space ie you have no idea how much of each chunk is used.

A simple way around this is to turn off the TLAB which gives you byte-by-byte memory account (at the cost of some performance)

ig try -XX:-UseTLAB on the command line to disable the TLAB and you will see the size of every object allocated.

It's hard to see why there's any difference at all.

First note that a Java object has significant overhead in the form of its header. That reduces your 50% expectation right there.

Next, when you consider that the references are typically 4 bytes wide (given compressed OOPs on a 64-bit HotSpot), but that memory is always allocated in blocks whose size is divisible by 8, you can see that what was left as unused 4 bytes at the end of one structure becomes your third reference in the DLL example.

In addition to what Marko said about the memory overhead of each linked-list node object, your "Integer value objects" stored in these nodes might not be as small as you think. The element type of java's DLL is a generic parameter, and generic parameters in java are always objects, (never primitives,) so even though you may be adding int s to Java's DLL, they get converted to objects (see boxing/unboxing) and stored as objects.

If your students' SLLs store actual primitive int s, then I would actually expect their classes to occupy considerably less space than that of Java's DLL. If your students store Integer objects, then you should consider the fact that the space occupied by these objects further dilutes whatever difference exists between the expected space occupied by the two classes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM