简体   繁体   English

将大文本文件加载到int数组中的最快方法

[英]Fastest way to load huge text file into a int array

I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). 我有一个大文本文件(+ 100MB),每行是一个整数(包含1000万个数字)。 Of course, the size and amount may change, so I don't know this in advance. 当然,尺寸和数量可能会有所变化,所以我事先并不知道。

I want to load the file into a int[] , making the process as fast as posible. 我想将文件加载到int[] ,使进程尽可能快。 First I came to this solution: 首先我来到这个解决方案:

public int[] fileToArray(String fileName) throws IOException
{
    List<String> list = Files.readAllLines(Paths.get(fileName));
    int[] res = new int[list.size()];
    int pos = 0;
    for (String line: list)
    {
        res[pos++] = Integer.parseInt(line);
    }
    return res;
}

It was pretty fast, 5.5 seconds. 它非常快,5.5秒。 Of which, 5.1s goes for the readAllLines call, and 0.4s for the loop. 其中,5.1s用于readAllLines调用,而0.4s用于循环。

But then I decided to try using BufferedReader, and came to this different solution: 但后来我决定尝试使用BufferedReader,并找到了这个不同的解决方案:

public int[] fileToArray(String fileName) throws IOException
{
    BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
    ArrayList<Integer> ints = new ArrayList<Integer>();
    String line;
    while ((line = bufferedReader.readLine()) != null)
    {
        ints.add(Integer.parseInt(line));
    }
    bufferedReader.close();

    int[] res = new int[ints.size()];
    int pos = 0;
    for (Integer i: ints)
    {
        res[pos++] = i.intValue();
    }
    return res;
}

This was even faster! 这更快! 3.1 seconds, just 3s for the while loop and not even 0.1s for the for loop. 3.1秒,只为3秒while循环并为连0.1秒for循环。

I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me. 我知道这里没有太多空间可供优化,至少在时间上,但是使用ArrayList然后使用int []对我来说似乎有太多的内存。

Any ideas on how to make this faster, or avoid using the middle ArrayList? 关于如何加快速度,或避免使用中间ArrayList的任何想法?

Just for comparison, I do this same task with FreePascal in 1.9 seconds [see edit], using TStringList class and StrToInt function. 仅仅为了比较,我使用TStringList类和StrToInt函数在1.9秒内使用FreePascal执行相同的任务[请参阅编辑]。

EDIT : Since I got a pretty short time with Java method, I had to improve the FreePascal one. 编辑 :由于我用Java方法很短的时间,我不得不改进FreePascal。 330~360ms. 330〜360ms。

If you're using Java 8, you can eliminate this middle ArrayList by using lines() and then mapping to an int , then collecting the values into an array. 如果您使用的是Java 8,则可以使用lines()然后映射到int ,然后将值收集到数组中来消除此中间ArrayList

You should also be using try-with-resources for proper exception handling and auto-closing. 您还应该使用try-with-resources进行正确的异常处理和自动关闭。

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    return br.lines()
             .mapToInt(Integer::parseInt)
             .toArray();
}

I'm not sure if this is faster, but it is certainly much easier to maintain. 我不确定这是否更快,但它肯定更容易维护。

Edit: It is apparently MUCH faster. 编辑:它显然要快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM