简体   繁体   English

从慢速流读取时,BufferedReader.readLine()是否可能不读取整行?

[英]Is it possible for BufferedReader.readLine() to not read a whole line when reading from a slow stream?

I'm experiencing a strange error with one of our systems that I am at a loss to explain. 我的一个系统遇到一个奇怪的错误,我不知所措。 In our system the backend is generating a large TSV output file which we are then serving across HTTP using the following code: 在我们的系统中,后端生成一个大的TSV输出文件,然后我们使用以下代码在HTTP上提供该文件:

    BufferedInputStream input = new BufferedInputStream(p.getInputStream(), (int)FileUtils.BYTES_PER_MEGABYTE * 16);
    OutputStream output = resp.getOutputStream();
    byte[] buffer = new byte[(int) (FileUtils.BYTES_PER_KILOBYTE * 8)];
    do
    {
        int read = input.read(buffer);
        if (read <= 0) break;
        output.write(buffer);           
    } while (true);
    input.close();
    output.close();

Then on the client side there is a TSV parser consuming the HTTP response but on very large inputs we start seeing strange artifacts where the parser will report a line as having the wrong number of items and the error message prints the line it will be parsing and that line will be a random chunk of data ie not an entire line of the data 然后在客户端,有一个使用HTTP响应的TSV解析器,但是在非常大的输入上,我们开始看到奇怪的工件,其中解析器将报告一行错误的项目数,并且错误消息会打印将要解析的行,该行将是随机数据块,即不是整个数据行

My first thought was that the TSV being generated was malformed but I've pretty much ruled this out by copying the file directly from the backend system and then running it through three independently written open source TSV parsers (including the one the client code is using) and all of them are capable of parsing the file fine when running on the local file. 我最初的想法是生成的TSV格式错误,但是我已经通过直接从后端系统复制文件,然后通过三个独立编写的开放源代码TSV解析器(包括客户端代码正在使用的解析器)运行该文件,将其排除在外),当它们在本地文件上运行时,所有文件都能够很好地解析文件。

For reference the code for the TSV parser we're using is here 供参考,我们使用的TSV解析器的代码在这里

This leads me to two possibilities: 这使我想到了两种可能性:

  1. The code I've shown for copying the file across HTTP is flawed in some way - in which case I'd love for someone to point out what dumb but non-obvious mistake I've made! 我显示的用于通过HTTP复制文件的代码在某些方面存在缺陷-在这种情况下,我希望有人指出我犯了什么愚蠢但不明显的错误!
  2. BufferedReader.readLine() which is being used by the consuming parser is not guaranteed to read whole lines? 不能保证使用解析器正在使用的BufferedReader.readLine()能读取整行吗? I'd wouldn't be entirely surprised if this is the case as I've been bitten by strange read behavior over slow network streams in .Net so wonder if a similar problem can apply in Java? 如果是这种情况,我将不会完全感到惊讶,因为我在.Net缓慢的网络流中被奇怪的读取行为所困扰,所以想知道类似的问题是否可以在Java中解决?

Or is there some other explanation I've overlooked? 还是我忽略了其他解释?

In posting this question I suddenly spotted what the error was (typically!) 在发布这个问题时,我突然发现了错误所在(通常是!)。

The following portion of the code I posted for copying the file is incorrect: 我发布的用于复制文件的代码的以下部分不正确:

int read = input.read(buffer);
if (read <= 0) break;
output.write(buffer);

It should instead be as follows: 而是应如下所示:

int read = input.read(buffer);
if (read <= 0) break;
output.write(buffer, 0, read);

The problem being that I was always writing the whole buffer to the output stream even when we'd read less from the input than the size of the buffer. 问题是,即使我们从输入中读取的内容少于缓冲区的大小,我也总是将整个缓冲区写入输出流。 This meant that at the end of the file we'd print the last chunk of the data and whatever was left in the rest of the buffer hence the random chunk of data left over! 这意味着在文件末尾,我们将打印数据的最后一块以及缓冲区其余部分中剩下的所有内容,因此将剩下随机的数据块!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM