I have to :-
What I am doing :
Problem:
As I am reading through BufferedReader wrapped around RandomAccessFile, it seems file pointer is moving far ahead in a single call to BufferedReader.readLine(). However, if I use RandomAccessFile.readLine() directely, file pointer is moving properly step by step in forward direction.
Using BufferedReader as a wrapper :
RandomAccessFile randomAccessFile = new RandomAccessFile("mybigfile.txt", "r");
BufferedReader brRafReader = new BufferedReader(new FileReader(randomAccessFile.getFD()));
while((line = brRafReader.readLine()) != null) {
System.out.println(line+", Position : "+randomAccessFile.getFilePointer());
}
Output:
Line goes here, Position : 13040
Line goes here, Position : 13040
Line goes here, Position : 13040
Line goes here, Position : 13040
Using Direct RandomAccessFile.readLine
RandomAccessFile randomAccessFile = new RandomAccessFile("mybigfile.txt", "r");
while((line = randomAccessFile.readLine()) != null) {
System.out.println(line+", Position : "+randomAccessFile.getFilePointer());
}
Output: (This is as expected. File pointer moving properly with each call to readline)
Line goes here, Position : 11011
Line goes here, Position : 11089
Line goes here, Position : 12090
Line goes here, Position : 13040
Could anyone tell, what wrong am I doing here ? Is there any way I can speed up reading process using RandomAccessFile ?
The reason for the observed behavior is that, as the name suggests, the BufferedReader
is buffered . It reads a larger chunk of data at once (into a buffer), and returns only the relevant parts of the buffer contents - namely, the part up to the next \\n
line separator.
I think there are, broadly speaking, two possible approaches:
For 1., you would no longer use RandomAccessFile#readLine
. Instead, you'd do your own buffering via
byte buffer[] = new byte[8192];
...
// In a loop:
int read = randomAccessFile.read(buffer);
// Figure out where a line break `\n` appears in the buffer,
// return the resulting lines, and take the position of the `\n`
// into account when storing the "file pointer"
As the vague comment indicates: This may be cumbersome and fiddly. You'd basically re-implement what the readLine
method does in the BufferedReader
class. And at this point, I don't even want to mention the headaches that different line separators or character sets could cause.
For 2., you could simply access the field of the BufferedReader
that stores the buffer offset. This is implemented in the example below. Of course, this is a somewhat crude solution, but mentioned and shown here as a simple alternative, depending on how "sustainable" the solution should be and how much effort you are willing to invest.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;
public class LargeFileRead {
public static void main(String[] args) throws Exception {
String fileName = "myBigFile.txt";
long before = System.nanoTime();
List<String> result = readBuffered(fileName);
//List<String> result = readDefault(fileName);
long after = System.nanoTime();
double ms = (after - before) / 1e6;
System.out.println("Reading took " + ms + "ms "
+ "for " + result.size() + " lines");
}
private static List<String> readBuffered(String fileName) throws Exception {
List<String> lines = new ArrayList<String>();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
BufferedReader brRafReader = new BufferedReader(
new FileReader(randomAccessFile.getFD()));
String line = null;
long currentOffset = 0;
long previousOffset = -1;
while ((line = brRafReader.readLine()) != null) {
long fileOffset = randomAccessFile.getFilePointer();
if (fileOffset != previousOffset) {
if (previousOffset != -1) {
currentOffset = previousOffset;
}
previousOffset = fileOffset;
}
int bufferOffset = getOffset(brRafReader);
long realPosition = currentOffset + bufferOffset;
System.out.println("Position : " + realPosition
+ " with FP " + randomAccessFile.getFilePointer()
+ " and offset " + bufferOffset);
lines.add(line);
}
return lines;
}
private static int getOffset(BufferedReader bufferedReader) throws Exception {
Field field = BufferedReader.class.getDeclaredField("nextChar");
int result = 0;
try {
field.setAccessible(true);
result = (Integer) field.get(bufferedReader);
} finally {
field.setAccessible(false);
}
return result;
}
private static List<String> readDefault(String fileName) throws Exception {
List<String> lines = new ArrayList<String>();
RandomAccessFile randomAccessFile = new RandomAccessFile(fileName, "r");
String line = null;
while ((line = randomAccessFile.readLine()) != null) {
System.out.println("Position : " + randomAccessFile.getFilePointer());
lines.add(line);
}
return lines;
}
}
(Note: The offsets may still appear to be off by 1, but this is due to the line separator not being taken into account in the position. This could be adjusted if necessary)
NOTE: This is only a sketch. The RandomAccessFile objects should be closed properly when reading is finished, but that depends on how the reading is supposed to be interrupted when the time limit is exceeded, as described in the question
BufferedReader reads a block of data from the file, 8 KB by default. Finding line breaks on order to return the next line is done in the buffer.
I guess, this is why you see a huge increment in the physical file position.
RandomAccessFile will not be using a buffer when reading the next line. It will read byte after byte. That's really slow.
How is performance when you just use a BufferedReader and remember the line you need to continue from?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.