简体   繁体   中英

Divide huge file java

I have big file more than 1 GB and I want to search for the occurrence of a certain word. so I want to task over several threads where each thread will handle a portion of the file.

what is the best approach to do this, I thought about read the file into several buffers of fixed size and pass each thread a buffer.

is there a better way to do this

[EDIT] i want to execut each thread on different device

A ByteBuffer, say on a RandomAccessFile would be feasible for files < 2 GB (2 31 ).

The general solution would be to use FileChannel , with its MappedByteBuffer .

With several buffers one must take care to have overlapping buffers, so the word can be found on buffer boundaries.

Reading the thread into the buffers will probably take just as long as just doing the search (the extra processing required to search is tiny compared to the time needed to read the file off the disk - and in fact it may well be able to do that processing in the time it would otherwise just be waiting for data).

Searching multiple locations in the file at once will be very slow on most storage systems.

The real question comes as to whether you are only searching each file once or if you frequently search them. If only once then you have no real choice but to scan the file and take the time. If you are doing it frequently then you could consider indexing the contents somehow.

Consider using Hadoop MapReduce.

If you want to execute threads (= divided tasks) on different devices, the input file should be on a distributed file system such as HDFS (Hadoop Distributed File System). MapReduce is a mechanism to divide one job into multiple tasks and run them on different machines in parallel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM