简体   繁体   English

ArrayList 与链表

[英]ArrayList Vs LinkedList

I was following a previous post on this that says:我在关注之前的一篇文章,上面写着:

For LinkedList对于链表

  • get is O(n)得到是 O(n)
  • add is O(1)添加是 O(1)
  • remove is O(n)删除是 O(n)
  • Iterator.remove is O(1) Iterator.remove 是 O(1)

For ArrayList对于 ArrayList

  • get is O(1)得到是 O(1)
  • add is O(1) amortized, but O(n) worst-case since the array must be resized and copied add 是 O(1) 摊销,但 O(n) 最坏情况,因为必须调整数组大小和复制
  • remove is O(n)删除是 O(n)

So by looking at this, I concluded that if I've to do just sequential insert in my collection for say 5000000 elements, LinkedList will outclass ArrayList .因此,通过查看这个,我得出结论,如果我只需要在我的集合中按顺序插入 5000000 个元素, LinkedList将超越ArrayList

And if I've to just fetch the elements from collection by iterating ie not grabbing the element in middle, still LinkedList will outclass `ArrayList.如果我只需要通过迭代从集合中获取元素,即不抓取中间的元素, LinkedList仍然会超越 `ArrayList.

Now to verify my above two statements, I wrote below sample program… But I'm surprised that my above statements were proven wrong.现在为了验证我上面的两个陈述,我写了下面的示例程序......但我很惊讶我的上面的陈述被证明是错误的。

ArrayList outclassed Linkedlist in both the cases. ArrayList在这两种情况下都优于Linkedlist It took less time than LinkedList for adding as well as fetching them from Collection.从 Collection 中添加和获取它们比LinkedList花费的时间更少。 Is there anything I'm doing wrong, or the initial statements about LinkedList and ArrayList does not hold true for collections of size 5000000?有什么我做错了,或者关于LinkedListArrayList的初始陈述不适用于大小为 5000000 的 collections 吗?

I mentioned size, because if I reduce the number of elements to 50000, LinkedList performs better and initial statements hold true.我提到了大小,因为如果我将元素的数量减少到 50000, LinkedList的性能会更好,并且初始语句也成立。

long nano1 = System.nanoTime();

List<Integer> arr = new ArrayList();
for(int i = 0; i < 5000000; ++i) {
    arr.add(i);
}
System.out.println( (System.nanoTime() - nano1) );

for(int j : arr) {
    ;
}
System.out.println( (System.nanoTime() - nano1) );

long nano2 = System.nanoTime();

List<Integer> arrL = new LinkedList();
for(int i = 0; i < 5000000; ++i) {
    arrL.add(i);
}
System.out.println( (System.nanoTime() - nano2) );

for(int j : arrL) {
    ;
}
System.out.println( (System.nanoTime() - nano2) );

Remember that big-O complexity describes asymptotic behaviour and may not reflect actual implementation speed.请记住,大 O 复杂度描述的是渐近行为,可能无法反映实际的实现速度。 It describes how the cost of each operation grows with the size of the list, not the speed of each operation.它描述了每个操作的成本如何随着列表的大小而不是每个操作的速度而增长。 For example, the following implementation of add is O(1) but is not fast:例如,下面的add实现是 O(1) 但并不快:

public class MyList extends LinkedList {
    public void add(Object o) {
        Thread.sleep(10000);
        super.add(o);
    }
}

I suspect in your case ArrayList is performing well because it increases it's internal buffer size fairly aggressively so there will not be a large number of reallocations.我怀疑在你的情况下 ArrayList 表现良好,因为它相当积极地增加了它的内部缓冲区大小,所以不会有大量的重新分配。 When the buffer does not need to be resized ArrayList will have faster add s.当缓冲区不需要调整大小时,ArrayList 将具有更快的add s。

You also need to be very careful when you do this kind of profiling.在进行此类分析时,您还需要非常小心。 I'd suggest you change your profiling code to do a warm-up phase (so the JIT has the opportunity to do some optimization without affecting your results) and average the results over a number of runs.我建议您更改分析代码以进行预热阶段(因此 JIT 有机会在不影响结果的情况下进行一些优化)并在多次运行中平均结果。

private final static int WARMUP = 1000;
private final static int TEST = 1000;
private final static int SIZE = 500000;

public void perfTest() {
    // Warmup
    for (int i = 0; i < WARMUP; ++i) {
        buildArrayList();
    }
    // Test
    long sum = 0;
    for (int i = 0; i < TEST; ++i) {
        sum += buildArrayList();
    }
    System.out.println("Average time to build array list: " + (sum / TEST));
}

public long buildArrayList() {
    long start = System.nanoTime();
    ArrayList a = new ArrayList();
    for (int i = 0; i < SIZE; ++i) {
        a.add(i);
    }
    long end = System.nanoTime();
    return end - start;
}

... same for buildLinkedList

(Note that sum may overflow and you might be better to use System.currentTimeMillis() ). (请注意, sum可能会溢出,您最好使用System.currentTimeMillis() )。

It's also possible that the compiler is optimizing away your empty get loops.编译器也可能正在优化您的空get循环。 Make sure the loop actually does something to ensure that the right code is getting called.确保循环确实做了一些事情来确保调用正确的代码。

This is a bad benchmark IMO.这是一个糟糕的基准 IMO。

  • need to repeat in loop multiple times to warm up jvm需要在循环中重复多次以预热 jvm
  • need to DO something in your iterative loop or it can be optimized array需要在您的迭代循环中做一些事情,或者可以优化数组
  • ArrayList resizes, which is costly. ArrayList调整大小,代价高昂。 If you had constructed ArrayList as new ArrayList(500000) you would construct in one blow, and then all allocations would be quite cheap (one preallocating backed array)如果您已将ArrayList构建为new ArrayList(500000)您将一口气构建,然后所有分配将非常便宜(一个预分配支持数组)
  • You don't specify your memory JVM - it should be run with -xMs == -Xmx (everything preallocated) and sufficiently high that no GC is likely to be triggered您没有指定您的 memory JVM - 它应该与 -xMs == -Xmx 一起运行(一切都预先分配)并且足够高以至于不会触发 GC
  • This benchmark doesn't cover the most unpleasant aspect of LinkedList - random access.该基准测试并未涵盖 LinkedList 最令人不快的方面——随机访问。 (an iterator isn't necessarily the same thing). (迭代器不一定是同一件事)。 If you feed say 10% of the size of a large collection as a random selection of list.get you will find linkedlists are awful for grabbing anything other than the first or last element.如果您将大型集合大小的 10% 作为list.get的随机选择提供,您会发现链表对于抓取除第一个或最后一个元素之外的任何内容都很糟糕。

For an arraylist: the jdk get is what you'd expect:对于 arraylist:jdk get 是您所期望的:

public E get(int index) {
    RangeCheck(index);

    return elementData[index];
}

(basically just return the indexed array element., (基本上只返回索引数组元素。,

For a linkedlist:对于链表:

public E get(int index) {
    return entry(index).element;
}

looks similar?看起来相似? Not quite.不完全的。 entry is a method not an primitive array, and look what it has to do: entry 是一个方法而不是原始数组,看看它必须做什么:

private Entry<E> entry(int index) {
    if (index < 0 || index >= size)
        throw new IndexOutOfBoundsException("Index: "+index+
                                            ", Size: "+size);
    Entry<E> e = header;
    if (index < (size >> 1)) {
        for (int i = 0; i <= index; i++)
            e = e.next;
    } else {
        for (int i = size; i > index; i--)
            e = e.previous;
    }
    return e;
}

That's right, if you ask for say list.get(250000) , it's gotta start at the head and repeatedly iterate through the next element.没错,如果您要求说list.get(250000) ,它必须从头开始并反复迭代下一个元素。 250000 accesses or so (there's an optimization in the code where it starts at head or tail depending on which would be less accesses.)大约 250000 次访问(代码中有一个优化,它从头部或尾部开始,具体取决于哪个访问次数较少。)

An ArrayList is a simpler data structure than a LinkedList. ArrayList 是比 LinkedList 更简单的数据结构。 An ArrayList has a single array of pointers in contiguous memory locations. ArrayList 在连续的 memory 位置中具有单个指针数组。 It only has to be recreated if the array is expanded beyond its allocated size.只有当数组扩展超出其分配的大小时,才需要重新创建它。

A LinkedList consists of a chain of nodes; LinkedList 由一系列节点组成; each node is separated allocated and has front and back pointers to other nodes.每个节点都是分开分配的,并且有指向其他节点的前后指针。

So what does this mean?那么这是什么意思? Unless you need to insert in the middle, splice, delete in the middle etc. an ArrayList will usually be faster.除非您需要在中间插入、拼接、删除等。ArrayList 通常会更快。 It needs less memory allocations, has much better locality of reference (which is important for processor caching) etc.它需要更少的 memory 分配,具有更好的引用局部性(这对于处理器缓存很重要)等。

To understand why the results you got do not contradict the "big O" characterization.要了解为什么您得到的结果与“大 O”特征不矛盾。 We need to go back to first principles;我们需要把 go 回到第一原则; ie the definition .定义

Let f(x) and g(x) be two functions defined on some subset of the real numbers.让 f(x) 和 g(x) 是定义在实数的某个子集上的两个函数。 One writes一个写

f(x) = O(g(x)) as x -> infinity

if and only if, for sufficiently large values of x, f(x) is at most a constant multiplied by g(x) in absolute value.当且仅当,对于足够大的 x 值,f(x) 至多是一个常数乘以 g(x) 的绝对值。 That is, f(x) = O(g(x)) if and only if there exists a positive real number M and a real number x0 such that也就是说,f(x) = O(g(x)) 当且仅当存在一个正实数 M 和一个实数 x0 使得

|f(x)| <= M |g(x)| for all x > x_0.

In many contexts, the assumption that we are interested in the growth rate as the variable x goes to infinity is left unstated, and one writes more simply that f(x) = O(g(x)).在许多情况下,当变量 x 趋于无穷时,我们对增长率感兴趣的假设没有说明,更简单的写法是 f(x) = O(g(x))。

So, the statement add1 is O(1) , means is that the time cost of an add1 operation on a list of size N tends towards a constant C add1 as N tends to infinity.因此,语句add1 is O(1) ,意味着对大小为 N 的列表执行add1操作的时间成本趋向于常数 C add1因为 N 趋于无穷大。

And the statement add2 is O(1) amortized over N operations , means is that the average time cost of one of a sequence of N add2 operations tends towards a constant C add2 as N tends to infinity.并且语句add2 is O(1) amortized over N operations ,这意味着 N 个add2操作序列中的一个的平均时间成本趋向于一个常数 C add2因为 N 趋于无穷大。

What is does not say is what those constants C add1 and C add2 are.没有说的是那些常数 C add1和 C add2是什么。 In fact the reason that LinkedList is slower than ArrayList in your benchmark is that C add1 is larger than C add2 .事实上,LinkedList 在基准测试中比 ArrayList 慢的原因是 C add1大于 C add2

The lesson is that big O notation does not predict absolute or even relative performance.教训是大 O 符号不能预测绝对甚至相对的性能。 All it predicts is the shape of the performance function as the controlling variable gets very large.它所预测的只是性能 function 的形状,因为控制变量变得非常大。 This is useful to know, but it doesn't tell you everything you need to know.这很有用,但它并不能告诉你你需要知道的一切。

1) Underlying Data Structure The first difference between ArrayList and LinkedList comes with the fact that ArrayList is backed by Array while LinkedList is backed by LinkedList. 1) 底层数据结构ArrayList 和 LinkedList 之间的第一个区别在于 ArrayList 由 Array 支持,而 LinkedList 由 LinkedList 支持。 This will lead further differences in performance.这将导致性能上的进一步差异。

2) LinkedList implements Deque Another difference between ArrayList and LinkedList is that apart from the List interface, LinkedList also implements Deque interface, which provides first in first out operations for add() and poll() and several other Deque functions. 2)LinkedList实现了Deque ArrayList和LinkedList的另一个区别是,除了List接口外,LinkedList还实现了Deque接口,为add()和poll()提供了先进先出操作等几个Deque函数。 3) Adding elements in ArrayList Adding element in ArrayList is O(1) operation if it doesn't trigger re-size of Array, in which case it becomes O(log(n)), On the other hand appending an element in LinkedList is O(1) operation, as it doesn't require any navigation. 3) 在 ArrayList 中添加元素 在 ArrayList 中添加元素是 O(1) 操作,如果它不触发数组的重新大小,在这种情况下它变成 O(log(n)),另一方面在 LinkedList 中附加一个元素是 O(1) 操作,因为它不需要任何导航。

4) Removing element from a position In order to remove an element from a particular index eg by calling remove(index), ArrayList performs a copy operation which makes it close to O(n) while LinkedList needs to traverse to that point which also makes it O(n/2), as it can traverse from either direction based upon proximity. 4) 从 position 中删除元素为了从特定索引中删除元素,例如通过调用 remove(index),ArrayList 执行复制操作,使其接近 O(n),而 LinkedList 需要遍历该点,这也使得它O(n / 2),因为它可以根据接近度从任一方向遍历。

5) Iterating over ArrayList or LinkedList Iteration is the O(n) operation for both LinkedList and ArrayList where n is a number of an element. 5) 迭代 ArrayList 或 LinkedList迭代是 LinkedList 和 ArrayList 的 O(n) 操作,其中 n 是元素的数量。

6) Retrieving element from a position The get(index) operation is O(1) in ArrayList while its O(n/2) in LinkedList, as it needs to traverse till that entry. 6) 从 position 中检索元素get(index) 操作在 ArrayList 中为 O(1),而在 LinkedList 中为 O(n/2),因为它需要遍历该条目。 Though, in Big O notation O(n/2) is just O(n) because we ignore constants there.不过,在大 O 表示法中,O(n/2) 只是 O(n),因为我们忽略了那里的常数。

7) Memory LinkedList uses a wrapper object, Entry, which is a static nested class for storing data and two nodes next and previous while ArrayList just stores data in Array. 7) Memory LinkedList uses a wrapper object, Entry, which is a static nested class for storing data and two nodes next and previous while ArrayList just stores data in Array.

So memory requirement seems less in the case of ArrayList than LinkedList except for the case where Array performs the re-size operation when it copies content from one Array to another.因此,在 ArrayList 的情况下,memory 的要求似乎比 LinkedList 少,除非 Array 在将内容从一个 Array 复制到另一个 Array 时执行重新调整大小操作的情况。

If Array is large enough it may take a lot of memory at that point and trigger Garbage collection, which can slow response time.如果 Array 足够大,此时可能需要大量 memory 并触发垃圾收集,这会减慢响应时间。

From all the above differences between ArrayList vs LinkedList, It looks ArrayList is the better choice than LinkedList in almost all cases, except when you do a frequent add() operation than remove(), or get().从 ArrayList 与 LinkedList 之间的所有上述差异来看,看起来 ArrayList 在几乎所有情况下都是比 LinkedList 更好的选择,除非您执行频繁的 add() 操作而不是 remove() 或 get()。

It's easier to modify a linked list than ArrayList, especially if you are adding or removing elements from start or end because linked list internally keeps references of those positions and they are accessible in O(1) time.修改链表比 ArrayList 更容易,尤其是当您从开始或结束添加或删除元素时,因为链表内部保留了这些位置的引用,并且它们可以在 O(1) 时间内访问。

In other words, you don't need to traverse through the linked list to reach the position where you want to add elements, in that case, addition becomes O(n) operation.换句话说,你不需要遍历链表到达你想要添加元素的position,这样的话,加法就变成了O(n)操作。 For example, inserting or deleting an element in the middle of a linked list.例如,在链表中间插入或删除一个元素。

In my opinion, use ArrayList over LinkedList for most of the practical purpose in Java.在我看来,在 Java 中的大部分实际用途中,使用 ArrayList 而不是 LinkedList。

It's hard to find a good use case for LinkedList.很难为 LinkedList 找到一个好的用例。 If you only need to make use of the Dequeu interface, you should probably use ArrayDeque.如果你只需要使用 Dequeu 接口,你可能应该使用 ArrayDeque。 If you really need to use the List interface, you will often hear the suggestion to use always ArrayList because LinkedList behaves really poorly in accessing a random element.如果你真的需要使用 List 接口,你会经常听到总是使用 ArrayList 的建议,因为 LinkedList 在访问随机元素时表现得非常糟糕。

Unfortunately also ArrayList has its performance problems if elements at the beginning or in the middle of the list must be removed or inserted.不幸的是,如果必须删除或插入列表开头或中间的元素,ArrayList 也会出现性能问题。

There is however a new list implementation called GapList which combines the strengths of both ArrayList and LinkedList.然而,有一个名为 GapList 的新列表实现,它结合了 ArrayList 和 LinkedList 的优点。 It has been designed as drop-in replacement for both ArrayList and LinkedList and therefore implements both the interfaces List and Deque.它被设计为 ArrayList 和 LinkedList 的直接替代品,因此实现了接口 List 和 Deque。 Also all public methods provided by ArrayList are implemented (ensureCapacty, trimToSize).还实现了 ArrayList 提供的所有公共方法(ensureCapacty、trimToSize)。

GapList's implementation guarantees efficient random access to elements by index (as ArrayList does) and at the same time efficient adding and removing elements to and from head and tail of the list (as LinkedList does). GapList 的实现保证了通过索引对元素进行有效的随机访问(如 ArrayList 所做的那样),同时在列表的头部和尾部有效地添加和删除元素(如 LinkedList 所做的那样)。

You find more information about GapList at https://dzone.com/articles/gaplist-lightning-fast-list .您可以在https://dzone.com/articles/gaplist-lightning-fast-list找到有关 GapList 的更多信息。 Get it at https://github.com/magicwerk/brownies-collections .https://github.com/magicwerk/brownies-collections获得它。

The big-O-notation is not about absolut timings, but about relative timings, and you can't compare the numbers of one algorithm to another.大 O 表示法不是关于绝对时间,而是关于相对时间,您无法将一种算法的数量与另一种算法的数量进行比较。

You only get information how the same algorithm reacts to increasing or decreasing numbers of tuples.您只能获得相同算法如何对增加或减少的元组数量做出反应的信息。

One algorithm might take an hour for one operation, and 2h for two operations, and is O(n), and another one is O(n) too, and takes one millisecond for one operation, and two milliseconds for two operations.一个算法一次操作可能需要一个小时,两次操作可能需要 2h,并且是 O(n),另一种也是 O(n),一次操作需要 1 毫秒,两次操作需要 2 毫秒。

Another issue if measuring with the JVM is the optimization of the hotspot-compiler.如果使用 JVM 进行测量,另一个问题是热点编译器的优化。 A do-nothing-loop might be eliminated by the JIT-compiler. JIT 编译器可能会消除无操作循环。

A third thing to consider is the OS and JVM, using caches and running the garbage collection meanwhile.要考虑的第三件事是操作系统和 JVM,同时使用缓存并运行垃圾收集。

You can separate add or remove as a two step operation.您可以将添加或删除作为两步操作分开。

LinkedList : If you add a element to index n, you can move the pointer from 0 to n-1, then you can perform your so called O(1) add operation. LinkedList :如果您将一个元素添加到索引 n,您可以将指针从 0 移动到 n-1,然后您可以执行所谓的 O(1) 添加操作。 Remove operation is the same.删除操作是一样的。


ArraryList : ArrayList implements the RandomAccess interface, which means it can access a element in O(1). ArraryList : ArrayList 实现了 RandomAccess 接口,这意味着它可以访问 O(1) 中的元素。
If you add a element in index n, it can go to the the n-1 index in O(1), move the elements after n-1, add set the element in the n slot.如果在索引 n 中添加一个元素,它可以 go 到 O(1) 中的 n-1 索引,将元素移动到 n-1 之后,添加设置元素在 n 槽中。
The moving operation is performed by a native method called System.arraycopy , it's pretty fast.移动操作由名为System.arraycopy的本机方法执行,速度非常快。

public static void main(String[] args) {

    List<Integer> arrayList = new ArrayList<Integer>();
    for (int i = 0; i < 100000; i++) {
        arrayList.add(i);
    }

    List<Integer> linkList = new LinkedList<Integer>();

    long start = 0;
    long end = 0;
    Random random = new Random();

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        linkList.add(random.nextInt(100000), 7);
    }
    end = System.currentTimeMillis();
    System.out.println("LinkedList add ,random index" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        arrayList.add(random.nextInt(100000), 7);
    }
    end = System.currentTimeMillis();
    System.out.println("ArrayList add ,random index" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        linkList.add(0, 7);
    }
    end = System.currentTimeMillis();
    System.out.println("LinkedList add ,index == 0" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        arrayList.add(0, 7);
    }
    end = System.currentTimeMillis();
    System.out.println("ArrayList add ,index == 0" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        linkList.add(i);
    }
    end = System.currentTimeMillis();
    System.out.println("LinkedList add ,index == size-1" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        arrayList.add(i);
    }
    end = System.currentTimeMillis();
    System.out.println("ArrayList add ,index == size-1" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        linkList.remove(Integer.valueOf(random.nextInt(100000)));
    }
    end = System.currentTimeMillis();
    System.out.println("LinkedList remove ,random index" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        arrayList.remove(Integer.valueOf(random.nextInt(100000)));
    }
    end = System.currentTimeMillis();
    System.out.println("ArrayList remove ,random index" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        linkList.remove(0);
    }
    end = System.currentTimeMillis();
    System.out.println("LinkedList remove ,index == 0" + (end - start));

    start = System.currentTimeMillis();
    for (int i = 0; i < 10000; i++) {
        arrayList.remove(0);
    }
    end = System.currentTimeMillis();
    System.out.println("ArrayList remove ,index == 0" + (end - start));

}

O notation analysis provides important information, but it has it's limitations. O 符号分析提供了重要的信息,但它有其局限性。 By definition O notation analysis considers that every operation takes approximately the same time to execute, which is not true.根据定义,O 符号分析认为每个操作的执行时间大致相同,这是不正确的。 As @seand pointed out, linked lists internally uses more complex logic to insert and fetch elements (take a look at the source code, you can ctrl+click in your IDE).正如@seand 所指出的,链表在内部使用更复杂的逻辑来插入和获取元素(查看源代码,您可以在 IDE 中按 ctrl+click)。 ArrayList internally only needs to insert elements into an array and increase its size once in a while (which even being an o(n) operation, in practice can be accomplished pretty fast). ArrayList 在内部只需要将元素插入数组并偶尔增加其大小(即使是 o(n) 操作,实际上也可以很快完成)。

Cheers干杯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM