简体   繁体   中英

Immutable / persistent list in Java

As a pet project I'm trying to implement an immutable list data structure in Java while minimizing copies as much as possible; I know about Google Collections but this is not what I'm after since list manipulations just return new copies of the old list.

I have come up with two different approaches to the problem; both are based on doubly linked lists, like this:

[head: element1] <--> [element2] <--> [tail: element3]

So every list consists of the tuple {head, tail} .


First, let's examine the simple case of appending or prepending an element to list A , resulting in list B :

A:                 [head: element1] <--> [element2] <--> [tail: element3]
B: [head: element0] <--> [element1] <--> [element2] <--> [tail: element3]

This is O(1). Since iterating over the list happens between head and tail only , A won't know anything about the new element prepended or appended to B .

It gets interesting when we try to insert or remove an arbitrary element in the list.

Indexed elements approach

Every list has a unique sequential id starting from 0. Every element has an array of {prev, next} pointers corresponding to the list ids:

  [element1] <--> [element2] <--> [element3] <--> [element4]
A:   [0] <---------> [0] <---------> [0] <---------> [0]
B:   [0] <---------> [1] <-------------------------> [1]
C:   ...

So when removing element3 from list A with id = 0, the prev or next pointers, respectively, with id = 1 (list B ) of element2 and element4 are altered to reflect the result of the requested operation; element1 remains unaltered. When iterating over a list with index x , in order to obtain the correct prev or next pointers, max(elementIdCount, x) is used to compute the correct index (which would be 0 for element1 and 1 for element2 if we are iterating over B with id = 1, for example).

Adding or replacing elements is done in the same way. This is also O(1), except when the element id arrays need to be resized, which should happen relatively infrequently.

The big problem with this is of course garbage collection - once an element has been added to a list, it is never going to be released until ALL references to modified versions of the original list are released. This could be kind of remedied by making a copy of the whole list on every 10 modifications for example.

This kind of list is especially well suited to code constructs like this:

while (...)
    list = list.addElement(...);

since only one reference to the list is held at any given time.

Iterator approach

The other approach is abusing iterators in order to make the resulting list look like the expected modified version; so each modified immutable list holds a reference to its "source" list and an additional tuple {operation, element, position} , like this:

A: [head: element1] <--> [element2] <--> [tail: element3]
B: source: A, {add, element_to_add, 1}

B 's iterator then calls its source list iterator (in this case A 's) except when it encounters the element that has been modified (added, removed or replaced), in which case it returns that element and then continues again with the source iterator.

The obvious disadvantage here is that the nested iterator depth grows with every modified version of a list. This means that making a raw copy every now and then is necessary as well.

Does anybody have any suggestions on how this may be improved? Also, any pointers to any data structures invented in the 60s that may be useful are more than welcome :)

You can create a head::tail like list, and get the benefits of easy creation and good memory footprint, and then provide an API that layers a skip list on top to get efficient random access when needed.

As far as efficient mutation in middle, the skip list view might have a side table mapping mutated indices to elements, and a binary searchable array mapping original index to index offsets after inserts and removes.

All this mapping raises the question of how to provide efficient immutable maps for some definition of efficient. The best way I've come up with is to use b trees which allow for O(log n) access to sortable keys, and O(log n) node creation on insertion and removal. The number of nodes shared between two holders of a b-tree based map after k modifications are approx. (n - k log n) which is pretty good in practice for infrequently updated maps.

Immutable list means you cannot modify lists items after it is created. So you are abusing the notation. What you want to do is: have a mutable list and return its immutable view.

Google Guava can return immutable view for you.

ImmutableList<T> view = ImmutableList.copyOf(mutableList);

You can make multiple updates to your mutableList before you ask for a new view, if you want to minimize copies as much as possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM