[英]What is the fastest way to store unknown amount of strings in java?
I want to store an unknown amount of strings and later read them in the order they were added. 我想存储未知数量的字符串,然后按照添加的顺序读取它们。 As I said the only features I need are:
正如我所说,我需要的唯一功能是:
The problem is that I want to output strings from part of a trie. 问题是我想从trie的一部分输出字符串。 So counting the strings before returning them would double the amount of time needed for the operation.
因此,在返回字符串之前计算字符串会使操作所需的时间加倍。
(Another solution would be to keep track of the number of strings in the trie using an attribute but as I want to return just a part of the trie this isn't a perfect solution either) (另一个解决方案是使用属性跟踪trie中的字符串数量,但因为我只想返回trie的一部分,这也不是一个完美的解决方案)
LinkedList<string>
sounds like a good bet to me... LinkedList<string>
对我来说听起来不错......
It's expensive to get at an arbitrary element, which is the normal reason not to use it... but it sounds like that isn't a problem in your case. 获取任意元素是很昂贵的,这是不使用它的正常原因......但听起来这在你的情况下不是问题。
An ArrayList is generally faster than a LinkedList. ArrayList通常比LinkedList快。 If you don't specify an appropriate size, each time the capacity is exhausted, it will have to reassign a new array (with the double size) and copy the elements to a new array, though.
如果未指定适当的大小,则每次容量耗尽时,都必须重新分配新数组(使用double大小)并将元素复制到新数组。
You could use a LinkedList to avoid this cost, but the average time will probably be bigger. 您可以使用LinkedList来避免此成本,但平均时间可能会更长。
Whatever the collection you use, if you don't have enough memory, the GC will trigger, which could also introduce some delay. 无论你使用什么样的集合,如果你没有足够的内存,GC会触发,这也可能会带来一些延迟。 An "unknown amount", without any limit, is impossible to store in any in-memory collection.
没有任何限制的“未知数量”不可能存储在任何内存中的集合中。 If "unknown" can be very very large and forbid the usage of an in-memory collection, you'll have to us a file or a database.
如果“unknown”可能非常大并且禁止使用内存中的集合,那么您将需要我们一个文件或数据库。
The two obvious choices are ArrayList
and LinkedList
. 两个明显的选择是
ArrayList
和LinkedList
。 A LinkedList
appears to be slightly slower than ArrayList
. LinkedList
似乎比ArrayList
略慢。 Here's my benchmarking code: 这是我的基准测试代码:
import java.util.*;
public class ListTest {
private static final int N = 50000;
private static final float NANO_TO_MILLI = 1.0e-6f;
public static void main(String[] args) {
String[] strings = new String[N];
for (int i = 0; i < N; ++i) {
strings[i] = Integer.toString(i);
}
System.out.print("ArrayList: ");
benchmark(strings, new ArrayList<String>());
System.out.print("LinkedList: ");
benchmark(strings, new LinkedList<String>());
}
private static void benchmark(String[] strings, List<String> list) {
// measure how long it takes to add the strings
long start = System.nanoTime();
for (String s : strings) {
list.add(s);
}
long addTime = System.nanoTime() - start;
// measure how long it takes to iterate the list
start = System.nanoTime();
int i = 0;
for (String s : list) {
++i;
}
long iterateTime = System.nanoTime() - start;
// report the results
System.out.println(String.format("add: %.2fms; iterate: %.2fms (%d strings)",
addTime * NANO_TO_MILLI,
iterateTime * NANO_TO_MILLI,
i));
}
}
And here are the results of a typical run: 以下是典型运行的结果:
ArrayList: add: 5.52ms;
ArrayList:add:5.52ms; iterate: 7.66ms (50000 strings)
迭代:7.66ms(50000个字符串)
LinkedList: add: 7.79ms;LinkedList:add:7.79ms; iterate: 8.32ms (50000 strings)
迭代:8.32ms(50000个字符串)
This was on a Windows machine with an Intel Core2 Quad Q6600 2.4GHz cpu. 这是在配备Intel Core2 Quad Q6600 2.4GHz cpu的Windows机器上。
Note that this only measures the overall time. 请注意,这仅测量总时间。 It doesn't measure the variation in add time of individual strings, which I would expect to be higher for
ArrayList
than for LinkedList
, due to the need to reallocate the internal array. 它不测量单个字符串的添加时间的变化,由于需要重新分配内部数组,因此我期望
ArrayList
比LinkedList
更高。
EDIT: If I modify main
to repeat the test five times in a row, with a call to System.gc()
after each call to benchmark
, then I get some interesting results: 编辑:如果我修改
main
连续五次重复测试,在每次调用benchmark
后调用System.gc()
,那么我得到一些有趣的结果:
ArrayList: add: 5.84ms;
ArrayList:add:5.84ms; iterate: 7.84ms (50000 strings)
迭代:7.84ms(50000个字符串)
LinkedList: add: 7.24ms;LinkedList:add:7.24ms; iterate: 8.27ms (50000 strings)
迭代:8.27ms(50000字符串)
ArrayList: add: 0.45ms;
ArrayList:add:0.45ms; iterate: 0.60ms (50000 strings)
迭代:0.60ms(50000个字符串)
LinkedList: add: 0.84ms;LinkedList:add:0.84ms; iterate: 5.35ms (50000 strings)
迭代:5.35ms(50000个字符串)
ArrayList: add: 0.52ms;
ArrayList:add:0.52ms; iterate: 0.72ms (50000 strings)
迭代:0.72ms(50000个字符串)
LinkedList: add: 0.81ms;LinkedList:add:0.81ms; iterate: 5.57ms (50000 strings)
迭代:5.57ms(50000字符串)
ArrayList: add: 3.77ms;
ArrayList:add:3.77ms; iterate: 0.71ms (50000 strings)
迭代:0.71ms(50000个字符串)
LinkedList: add: 3.35ms;LinkedList:add:3.35ms; iterate: 0.93ms (50000 strings)
迭代:0.93ms(50000字符串)
ArrayList: add: 3.39ms;
ArrayList:add:3.39ms; iterate: 0.87ms (50000 strings)
迭代:0.87ms(50000字符串)
LinkedList: add: 3.38ms;LinkedList:add:3.38ms; iterate: 0.86ms (50000 strings)
迭代:0.86ms(50000个字符串)
This is probably due to caching by the cpu. 这可能是由于cpu的缓存。 Note that
LinkedList
can be slightly faster (eg, the last to iterations) for adding strings, although it can also be much slower. 请注意,
LinkedList
可以稍微更快(例如,最后一次迭代)添加字符串,虽然它也可以慢得多。 Iteration can also be drastically slower for LinkedList
, also probably because of lack of locality. LinkedList
迭代速度也可能非常慢,也可能是因为缺乏局部性。
Use an implementation of the List
interface. 使用
List
接口的实现。 It's generally considered that ArrayList
is the best general-purpose collection to use, so do something as simple as this for storing your strings: 通常 认为
ArrayList
是最好的通用集合,所以做一些简单的事情来存储你的字符串:
List<String> stringList = new ArrayList<String>();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.