简体   繁体   中英

Java: What's the most efficient way to read relatively large txt files and store its data?

I was supposed to write a method that reads a DNA sequence in order to test some string matching algorithms on it.

I took some existing code I use to read text files (don't really know any others):

try {
    FileReader fr = new FileReader(file);
    BufferedReader br = new BufferedReader(fr);

    while((line = br.readLine()) != null) {
        seq += line;
    }

    br.close();
}
catch(FileNotFoundException e) { e.printStackTrace(); }
catch(IOException e) { e.printStackTrace(); }

This seems to work just fine for small text files with ~3000 characters, but it takes forever (I just cancelled it after 10 minutes) to read files containing more than 45 million characters.

Is there a more efficient way of doing this?

One thing I notice is that you are doing seq+=line. seq is probably a String? If so, then you have to remember that strings are immutable. So in fact what you are doing is creating a new String each time you are trying to append a line to it. Please use StringBuilder instead. Also, if possible you don't want to do create a string and then process. That way you have to do it twice. Ideally you want to process as you read, but I don't know your situation.

The main element slowing your progress is the "concatenation" of the String seq and line when you call seq+=line. I use quotes for concatenation because in Java, Strings cannot be modified once they are created (eg immutable as user1598503 mentioned). Initially, this is not an issue, as the Strings are small, however once the Strings become very long, ee hundreds of thousands of characters, memory must be reallocated for the new String, which takes quite a bit of time. StringBuilder will allow you to do these concatenations in place, meaning you will not be creating a new Object every single time.

Your problem is not that the reading takes too much time, but the concatenating takes too much time. Just to verify this I ran your code (didn't finish) and then simply comented line 8 (seq += line) and it ran in under a second. You could try using seq = seq.concat(line) since it has been reported to be quite a bit faster most of the times, but I tried that too and didn't ran under 1-2 minutes (for a 9.6mb input file). My solution would be to store your lines in an ArrayList (or a container of your choice). The ArrayList example worked in about 2-3 seconds with the same input file. (so the content of your while loop would be list.add(line);). If you really, really want to store your entire file in a string you could do something like this (using the Scanner class):

String content = new Scanner(new File("input")).useDelimiter("\\Z").next();

^^This works in a matter of seconds as well. I should mention that "\\Z" is the end of file delimiter so that's why it reads the whole thing in one swoop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM