[英]Fastest way to load huge text file into a int array
I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). 我有一个大文本文件(+ 100MB),每行是一个整数(包含1000万个数字)。 Of course, the size and amount may change, so I don't know this in advance. 当然,尺寸和数量可能会有所变化,所以我事先并不知道。
I want to load the file into a int[]
, making the process as fast as posible. 我想将文件加载到int[]
,使进程尽可能快。 First I came to this solution: 首先我来到这个解决方案:
public int[] fileToArray(String fileName) throws IOException
{
List<String> list = Files.readAllLines(Paths.get(fileName));
int[] res = new int[list.size()];
int pos = 0;
for (String line: list)
{
res[pos++] = Integer.parseInt(line);
}
return res;
}
It was pretty fast, 5.5 seconds. 它非常快,5.5秒。 Of which, 5.1s goes for the readAllLines
call, and 0.4s for the loop. 其中,5.1s用于readAllLines
调用,而0.4s用于循环。
But then I decided to try using BufferedReader, and came to this different solution: 但后来我决定尝试使用BufferedReader,并找到了这个不同的解决方案:
public int[] fileToArray(String fileName) throws IOException
{
BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
ArrayList<Integer> ints = new ArrayList<Integer>();
String line;
while ((line = bufferedReader.readLine()) != null)
{
ints.add(Integer.parseInt(line));
}
bufferedReader.close();
int[] res = new int[ints.size()];
int pos = 0;
for (Integer i: ints)
{
res[pos++] = i.intValue();
}
return res;
}
This was even faster! 这更快! 3.1 seconds, just 3s for the while
loop and not even 0.1s for the for
loop. 3.1秒,只为3秒while
循环并为连0.1秒for
循环。
I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me. 我知道这里没有太多空间可供优化,至少在时间上,但是使用ArrayList然后使用int []对我来说似乎有太多的内存。
Any ideas on how to make this faster, or avoid using the middle ArrayList? 关于如何加快速度,或避免使用中间ArrayList的任何想法?
Just for comparison, I do this same task with FreePascal in 1.9 seconds [see edit], using TStringList
class and StrToInt
function. 仅仅为了比较,我使用TStringList
类和StrToInt
函数在1.9秒内使用FreePascal执行相同的任务[请参阅编辑]。
EDIT : Since I got a pretty short time with Java method, I had to improve the FreePascal one. 编辑 :由于我用Java方法很短的时间,我不得不改进FreePascal。 330~360ms. 330〜360ms。
If you're using Java 8, you can eliminate this middle ArrayList
by using lines()
and then mapping to an int
, then collecting the values into an array. 如果您使用的是Java 8,则可以使用lines()
然后映射到int
,然后将值收集到数组中来消除此中间ArrayList
。
You should also be using try-with-resources for proper exception handling and auto-closing. 您还应该使用try-with-resources进行正确的异常处理和自动关闭。
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
return br.lines()
.mapToInt(Integer::parseInt)
.toArray();
}
I'm not sure if this is faster, but it is certainly much easier to maintain. 我不确定这是否更快,但它肯定更容易维护。
Edit: It is apparently MUCH faster. 编辑:它显然要快得多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.