简体   繁体   中英

Is possible to efficiently skip a line of an unknown size in Java?

While reading very huge files (GB scale) in Java, I know exactly which lines I need to process. But, I do not know the size of each line and they could be different.

My question is the following:

Do you have an efficient approach to skip useless lines ? My (naive) approach consists of reading the line and not process it, but It sounds like a waste of time and memory space.

The code I'm looking for can look like this :

SortedMap goodLineNumbers = ......

Int  currentLineNumber = 1;

try(BufferedReader br = new BufferedReader(new FileReader(tracefile))) {

    do{
         if(goodLineNumbers.containsKey(currentLineNumber)) {
               line = br.readLine();    
               // process line
         } else  {
              line = EfficientSkip(br); // don't know the size of the line
         }
         currentLineNumber++;
    }
    while(line != null);
} catch (IOException e) {           
    e.printStackTrace();
}

If you dont want BufferedReader create Strings for lines you dont need, read input by char, count lines by EOL and use BufferedReader.readLilne() when you are at the beginning of the line you need. I am not sure if it will improve overall performance though.

Try using a LineNumberReader instead. You can get / set the current line to read. That way, you could just access and read those lines you want. Period.

Thanks to Dima for pointing out that LineNumberReader cannot access by line number either.

Thinking more on the problem, it is theoretically impossible to determine at what point in the file a certain line begins, unless one either: A) has prior knowledge of the (combined) length of previous lines, or B) reads the whole file up to that given point (with or without processing the contents).

There is no magic. To know how many lines you have read, you have to read them one by one and count. You don't have to store useless lines ( while (count++ < nextGoodNumber && reader.readLine() != null); will do), but you do have to read them one by one.

If you have ownership over file's format, you can add a length of each line before writing it, a kind of a header. This will let you jump from line to line, whiteout reading it up to the end. For this task you might use RandomAccessFile instead of BufferedReader.

readLong() - read line's length

readLine() - if this is required line

skipBytes(int n) - otherwise

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM