简体   繁体   中英

Best way to read huge file in MB in java

I was going through the post and it says either use BufferedReader or MappedByteBuffer. I decided to test on my own with 291.0 MB file but still not able to decide

    BufferedReader reader = new BufferedReader(new FileReader("/Users/rachana/part-00000"));
    String line = null;
    while((line = reader.readLine())!=null) {
        System.out.println(line);
    }


    ~~~~~~ Heap utilization in MB ~~~~~~
    Start Date  21:10:20
    End Date 21:17:48
    Time used 448 second
           7.50 min
    Used Memory In MB:28
    Free Memory:81
    Total Memory:109
    Max Memory:1820

With MappedByteBuffer

RandomAccessFile aFile = new RandomAccessFile
                ("/Users/rachana/part-00000", "r");
        FileChannel inChannel = aFile.getChannel();
        MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        buffer.load(); 
        for (int i = 0; i < buffer.limit(); i++)
        {
            System.out.print((char) buffer.get());
        }
        buffer.clear(); // do something with the data and clear/compact it.
        inChannel.close();
        aFile.close();



~~~~~~ Heap utilization in MB ~~~~~~
 Start Date  21:20:40
 End Date 21:33:52
 Time used 792 sec / 13.2 min
Used Memory In MB:4 
Free Memory:104
Total Memory:109
Max Memory:1820

It clearly states that MappedByteBuffer uses less memory but more time where as BufferedReader uses more memory but less time.

I am trying to find the balance also way to read line using MappedByteBuffer.

Any suggestion will be helpfull

The slowest part of what you are doing is printing to the screen. I suggest you not do that and you will fine that the MemoryMapped file is much faster (if you are not printing one character at a time to the console)

Note: these two are not interchangeable unless you are using an IS-8859-1 or US-ASCII encoded text file. BufferedReader is for text and memory mapped file is for binary.

BTW There is no point looking at the memory used if you ignore the number of GC you performed. If you only care about memory used at the start and finish, you should do a full GC with System.gc() before you measure and I would expect you would see a small, random difference (could be negative) in both cases.

If you case about allocations, you need a larger eden size eg 2 GB which starts empty (after a full GC) or you could use a profiler to measure the allocation. In the first case, the Strings will allocate the most data and in the second the writing to the console will create the most.

It clearly states that MappedByteBuffer uses less memory but more time where as BufferedReader uses more memory but less time.

Obviously that can't be true, and it isn't. You're mapping the entire 300MB file into memory with the MappedByteBuffer, and not with the BufferedReader . The explanation is that MappedByteBuffer memory doesn't come from the heap. It uses memory all right, as much as the file size, which is far more than your BufferedReader code. You're just not measuring it here.

Similarly your time measurements are also invalid, as they are dominated by System.out.println(), which isn't input, and which one would hope isn't part of the final application either.

So your benchmark is completely invalid in all respects.

Use BufferedReader . You can read millions of lines a second with that. It's fast enough.

I would go with the first one unless you're really trying to scrape the barrel for memory optimisations.

Reasons:

  • It's easier to read the code.
  • Users are more likely to notice the 100% speed up than the 24 MB of extra memory.

As you are doing file I/O, you should bear in mind that the I/O operations are likely to be very much slower than any work done by the CPU in your code.

But there are other considerations. Optimisations tend to make code more complicated and harder to understand. To understand your MappedByteBuffer code a reader needs to understand how a MappedByteBuffer works in addition to everything they need to understand for file input.

File reading is commonly done. So it should not surprise you that Java already provides code to help you. That code will have been written by experts, tested and debugged. Unless you have special requirements you should always use such code rather than writing your own. That is, I recommend using BufferedReader (your first approach).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM