简体   繁体   中英

Using NIO vs RandomAccessFile to read chunks of files

I want to read a large text file about several GBs and process it without loading the whole file but loading chunks of it.(Processing involves counting word instances)

If I'm using a concurrent hash map to process the file in parallel to make it more efficient, is there a way to use NIO or random access file to read it in chunks? Would it make it even more efficient?

The current implementation is using a buffered reader that goes something like this:

while(lines.size() <= numberOfLines && (line = bufferedReader.readLine()) != null) {
     lines.add(line);
}

lines.parallelStream().. // processing logic using ConcurrentHashMap

RandomAccessFile makes only sense if you intend to "jump" around within the file and your description of what you're doing doesn't sound like that. NIO makes sense if you have to cope with lots of parallel communication going on and you want to do non-blocking operations eg on Sockets. That as well doesn't seem to be your use case.

So my suggestion is to stick with the simple approach of using a BufferedReader on top of a InputStreamReader(FileInputStream) (don't use FileReader because that doesn't allow you to specify the charset/encoding to be used) and go through the data as you showed in your sample code. Leave away the parallelStream, only if you see bad performance you might try that out.

Always remember: Premature optimization is the root of all evil.

The obvious java 7 Solution is :

 String lines = Files.readAllLines(Paths.get("file"), StandardCharsets.UTF_8).reduce((a,b)->a+b);  

Honestly I got no Idea if it is faster but I gues under the hood it does not read it into a buffer so at least in theory it should be faster

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM