简体   繁体   English

在java中存储未知数量的字符串的最快方法是什么?

[英]What is the fastest way to store unknown amount of strings in java?

I want to store an unknown amount of strings and later read them in the order they were added. 我想存储未知数量的字符串,然后按照添加的顺序读取它们。 As I said the only features I need are: 正如我所说,我需要的唯一功能是:

  • Possibility to add an unknown amount of strings without slowing down because of resizing 可以添加未知数量的字符串而不会因为调整大小而减慢速度
  • Possibility to read the elements in the order they were added 可以按添加顺序读取元素

The problem is that I want to output strings from part of a trie. 问题是我想从trie的一部分输出字符串。 So counting the strings before returning them would double the amount of time needed for the operation. 因此,在返回字符串之前计算字符串会使操作所需的时间加倍。

(Another solution would be to keep track of the number of strings in the trie using an attribute but as I want to return just a part of the trie this isn't a perfect solution either) (另一个解决方案是使用属性跟踪trie中的字符串数量,但因为我只想返回trie的一部分,这也不是一个完美的解决方案)

LinkedList<string> sounds like a good bet to me... LinkedList<string>对我来说听起来不错......

  • Maintains order 维持秩序
  • O(1) addition at head or tail O(1)在头部或尾部添加
  • O(1) removal at head or tail O(1)在头部或尾部移除
  • Cheap iteration 便宜的迭代

It's expensive to get at an arbitrary element, which is the normal reason not to use it... but it sounds like that isn't a problem in your case. 获取任意元素是很昂贵的,这是不使用它的正常原因......但听起来这在你的情况下不是问题。

An ArrayList is generally faster than a LinkedList. ArrayList通常比LinkedList快。 If you don't specify an appropriate size, each time the capacity is exhausted, it will have to reassign a new array (with the double size) and copy the elements to a new array, though. 如果未指定适当的大小,则每次容量耗尽时,都必须重新分配新数组(使用double大小)并将元素复制到新数组。

You could use a LinkedList to avoid this cost, but the average time will probably be bigger. 您可以使用LinkedList来避免此成本,但平均时间可能会更长。

Whatever the collection you use, if you don't have enough memory, the GC will trigger, which could also introduce some delay. 无论你使用什么样的集合,如果你没有足够的内存,GC会触发,这也可能会带来一些延迟。 An "unknown amount", without any limit, is impossible to store in any in-memory collection. 没有任何限制的“未知数量”不可能存储在任何内存中的集合中。 If "unknown" can be very very large and forbid the usage of an in-memory collection, you'll have to us a file or a database. 如果“unknown”可能非常大并且禁止使用内存中的集合,那么您将需要我们一个文件或数据库。

The two obvious choices are ArrayList and LinkedList . 两个明显的选择是ArrayListLinkedList A LinkedList appears to be slightly slower than ArrayList . LinkedList似乎比ArrayList略慢。 Here's my benchmarking code: 这是我的基准测试代码:

import java.util.*;

public class ListTest {
    private static final int N = 50000;
    private static final float NANO_TO_MILLI = 1.0e-6f;

    public static void main(String[] args) {
        String[] strings = new String[N];
        for (int i = 0; i < N; ++i) {
            strings[i] = Integer.toString(i);
        }

        System.out.print("ArrayList: ");
        benchmark(strings, new ArrayList<String>());

        System.out.print("LinkedList: ");
        benchmark(strings, new LinkedList<String>());
    }

    private static void benchmark(String[] strings, List<String> list) {
        // measure how long it takes to add the strings
        long start = System.nanoTime();
        for (String s : strings) {
            list.add(s);
        }
        long addTime = System.nanoTime() - start;

        // measure how long it takes to iterate the list
        start = System.nanoTime();
        int i = 0;
        for (String s : list) {
            ++i;
        }
        long iterateTime = System.nanoTime() - start;

        // report the results
        System.out.println(String.format("add: %.2fms; iterate: %.2fms (%d strings)",
            addTime * NANO_TO_MILLI,
            iterateTime * NANO_TO_MILLI,
            i));
    }
}

And here are the results of a typical run: 以下是典型运行的结果:

ArrayList: add: 5.52ms; ArrayList:add:5.52ms; iterate: 7.66ms (50000 strings) 迭代:7.66ms(50000个字符串)
LinkedList: add: 7.79ms; LinkedList:add:7.79ms; iterate: 8.32ms (50000 strings) 迭代:8.32ms(50000个字符串)

This was on a Windows machine with an Intel Core2 Quad Q6600 2.4GHz cpu. 这是在配备Intel Core2 Quad Q6600 2.4GHz cpu的Windows机器上。

Note that this only measures the overall time. 请注意,这仅测量总时间。 It doesn't measure the variation in add time of individual strings, which I would expect to be higher for ArrayList than for LinkedList , due to the need to reallocate the internal array. 它不测量单个字符串的添加时间的变化,由于需要重新分配内部数组,因此我期望ArrayListLinkedList更高。

EDIT: If I modify main to repeat the test five times in a row, with a call to System.gc() after each call to benchmark , then I get some interesting results: 编辑:如果我修改main连续五次重复测试,在每次调用benchmark后调用System.gc() ,那么我得到一些有趣的结果:

ArrayList: add: 5.84ms; ArrayList:add:5.84ms; iterate: 7.84ms (50000 strings) 迭代:7.84ms(50000个字符串)
LinkedList: add: 7.24ms; LinkedList:add:7.24ms; iterate: 8.27ms (50000 strings) 迭代:8.27ms(50000字符串)

ArrayList: add: 0.45ms; ArrayList:add:0.45ms; iterate: 0.60ms (50000 strings) 迭代:0.60ms(50000个字符串)
LinkedList: add: 0.84ms; LinkedList:add:0.84ms; iterate: 5.35ms (50000 strings) 迭代:5.35ms(50000个字符串)

ArrayList: add: 0.52ms; ArrayList:add:0.52ms; iterate: 0.72ms (50000 strings) 迭代:0.72ms(50000个字符串)
LinkedList: add: 0.81ms; LinkedList:add:0.81ms; iterate: 5.57ms (50000 strings) 迭代:5.57ms(50000字符串)

ArrayList: add: 3.77ms; ArrayList:add:3.77ms; iterate: 0.71ms (50000 strings) 迭代:0.71ms(50000个字符串)
LinkedList: add: 3.35ms; LinkedList:add:3.35ms; iterate: 0.93ms (50000 strings) 迭代:0.93ms(50000字符串)

ArrayList: add: 3.39ms; ArrayList:add:3.39ms; iterate: 0.87ms (50000 strings) 迭代:0.87ms(50000字符串)
LinkedList: add: 3.38ms; LinkedList:add:3.38ms; iterate: 0.86ms (50000 strings) 迭代:0.86ms(50000个字符串)

This is probably due to caching by the cpu. 这可能是由于cpu的缓存。 Note that LinkedList can be slightly faster (eg, the last to iterations) for adding strings, although it can also be much slower. 请注意, LinkedList可以稍微更快(例如,最后一次迭代)添加字符串,虽然它也可以慢得多。 Iteration can also be drastically slower for LinkedList , also probably because of lack of locality. LinkedList迭代速度也可能非常慢,也可能是因为缺乏局部性。

Use an implementation of the List interface. 使用List接口的实现。 It's generally considered that ArrayList is the best general-purpose collection to use, so do something as simple as this for storing your strings: 通常 认为 ArrayList是最好的通用集合,所以做一些简单的事情来存储你的字符串:

List<String> stringList = new ArrayList<String>();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM