简体   繁体   中英

Fastest way to load huge text file into a int array

I have a big text file (+100MB), each line being an integer number (containing 10 million numbers). Of course, the size and amount may change, so I don't know this in advance.

I want to load the file into a int[] , making the process as fast as posible. First I came to this solution:

public int[] fileToArray(String fileName) throws IOException
{
    List<String> list = Files.readAllLines(Paths.get(fileName));
    int[] res = new int[list.size()];
    int pos = 0;
    for (String line: list)
    {
        res[pos++] = Integer.parseInt(line);
    }
    return res;
}

It was pretty fast, 5.5 seconds. Of which, 5.1s goes for the readAllLines call, and 0.4s for the loop.

But then I decided to try using BufferedReader, and came to this different solution:

public int[] fileToArray(String fileName) throws IOException
{
    BufferedReader bufferedReader = new BufferedReader(new FileReader(new File(fileName)));
    ArrayList<Integer> ints = new ArrayList<Integer>();
    String line;
    while ((line = bufferedReader.readLine()) != null)
    {
        ints.add(Integer.parseInt(line));
    }
    bufferedReader.close();

    int[] res = new int[ints.size()];
    int pos = 0;
    for (Integer i: ints)
    {
        res[pos++] = i.intValue();
    }
    return res;
}

This was even faster! 3.1 seconds, just 3s for the while loop and not even 0.1s for the for loop.

I know there is no much space here for optimization, at least in time, but using an ArrayList and then a int[] seems like too much memory to me.

Any ideas on how to make this faster, or avoid using the middle ArrayList?

Just for comparison, I do this same task with FreePascal in 1.9 seconds [see edit], using TStringList class and StrToInt function.

EDIT : Since I got a pretty short time with Java method, I had to improve the FreePascal one. 330~360ms.

If you're using Java 8, you can eliminate this middle ArrayList by using lines() and then mapping to an int , then collecting the values into an array.

You should also be using try-with-resources for proper exception handling and auto-closing.

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    return br.lines()
             .mapToInt(Integer::parseInt)
             .toArray();
}

I'm not sure if this is faster, but it is certainly much easier to maintain.

Edit: It is apparently MUCH faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM