简体   繁体   English

在Java中,如何有效地从文件中读取x行,将其关闭,然后从x行开始再次打开它并继续读取

[英]in java how to efficiently read x lines from a file, close it, and then open it again, starting at line x and continue reading

BufferedReader has easy methods for reading a file line by line. BufferedReader具有简单的方法来逐行读取文件。 But there doesn't seem to be anyway to keep track of where you are, so you can get back to that place later. 但是似乎仍然无法跟踪您的位置,因此您可以稍后再返回该位置。 FileInputStream has file getChannel() which returns a FileChannel which can tell you the current position in the stream. FileInputStream具有文件getChannel(),该文件返回一个FileChannel,它可以告诉您流中的当前位置。 So if you give a BufferedReader a FileInputStream to read from, you can find out where the BufferedReader stopped reading in the FileInputStream, and you can also set the FileInputStream to that position before you give it to the buffered reader. 因此,如果为BufferedReader提供FileInputStream以供读取,则可以在FileInputStream中找出BufferedReader停止读取的位置,还可以将FileInputStream设置到该位置,然后再将其提供给缓冲的读取器。

The problem is that the BufferedReader has read ahead in the file. 问题在于BufferedReader已提前读取了文件。 So the position of the FileInputStream is not the same as the position in the BufferedReader. 因此,FileInputStream的位置与BufferedReader中的位置不同。 You may have read 20 lines from the BufferedReader, but the BufferedReader may have read 30 from the FileInputStream. 您可能已经从BufferedReader中读取了20行,但是BufferedReader可能已经从FileInputStream中读取了30行。 If later you reopen the file, based on the position in the FileInputStream, you will have missed those intervening 10 lines. 如果以后再根据FileInputStream中的位置重新打开文件,则会错过中间的10行。

I could reader character by character from the InputStream, but it seems like there is probably a better way... 我可以从InputStream逐个字符地读取字符,但似乎有更好的方法...

This is an extremely difficult problem to solve using existing Java classes. 这是使用现有Java类解决的极其困难的问题。 For one reason, you've ignored the fact that you can't actually pass in an InputStream to a BufferedReader, you need to pass in a Reader. 由于一个原因,您已经忽略了以下事实:您实际上无法将InputStream传递给BufferedReader,而需要传递Reader。

Files deal in bytes, but Readers deal with Characters. 文件以字节为单位,但读者以字符为单位。 Since any given character can take up an arbitrary number of bytes in an arbitrary character set, you would need to record how many bytes each character took up to be able to compute the number of bytes that a certain number of characters represent in the file. 由于任何给定字符都可以占用任意字符集中的任意数量的字节,因此,您需要记录每个字符要占用多少字节,才能计算文件中一定数量的字符所代表的字节数。

If you are willing to go for a very fragile approach, you could assume that every byte in your file represents a character (eg ASCII) and that every line is terminated by "\\n" . 如果您愿意采用一种非常脆弱的方法,则可以假定文件中的每个字节都代表一个字符(例如ASCII),并且每一行都以"\\n"结尾。 Then it would just be a matter of recording how many characters you've read. 然后,只需记录您已阅读的字符数即可。 Something like this: 像这样:

public class CountingBufferedReader extends BufferedReader {
     private int position = 0;
     public String readLine() {
        String line = super.readLine();
        position += line.length() + 1;
        return line;
     }

     public int getPosition() {
         return position;
     }
}

Making it work generically for any input and any character set is much more difficult, and would probably involve rewriting many existing classes to be efficient. 使它对于任何输入和任何字符集都通用,要困难得多,并且可能涉及重写许多现有类以提高效率。

It may not answer your question completely but Apache IOUtils will read the Stream into a List http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#readLines%28java.io.InputStream%29 它可能无法完全回答您的问题,但是Apache IOUtils会将流读入列表http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#readLines%28java.io .InputStream 29%

Then you can access any line in the list directly (the stream will have been closed) 然后,您可以直接访问列表中的任何行(流将被关闭)

from their docs: 从他们的文档:

public static List readLines(InputStream input)
                      throws IOException

    Get the contents of an InputStream as a list of Strings, one entry per line, using the default character encoding of the platform.

    This method buffers the input internally, so there is no need to use a BufferedInputStream.

    Parameters:
        input - the InputStream to read from, not null 
    Returns:
        the list of Strings, never null 
    Throws:
        NullPointerException - if the input is null 
        IOException - if an I/O error occurs
    Since:
        Commons IO 1.1

Use the java.io.LineNumberReader class. 使用java.io.LineNumberReader类。 That tells you what line you're currently on. 这可以告诉您当前所在的行。 Next time just read that many lines. 下次只需阅读那么多行。 That does make your problem O(N**X) where X is the number of times you read the file, but then why are you only reading part of the file anyway? 这确实使您的问题成为O(N ** X),其中X是您读取文件的次数,但是为什么您仍然只读取文件的一部分?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM