简体   繁体   English

FileInputStream.read(byte[]) 有什么问题?

[英]What is wrong with FileInputStream.read(byte[])?

In response to my answer to a file-reading question , a commenter stated that FileInputStream.read(byte[]) is "not guaranteed to fill the buffer."在回答我对文件读取问题的回答时,一位评论者表示FileInputStream.read(byte[]) “不能保证填充缓冲区”。

File file = /* ... */  
long len = file.length();
byte[] buffer = new byte[(int)len];
FileInputStream in = new FileInputStream(file);
in.read(buffer);

(The code assumes that the file length does not exceed 2GB) (代码假设文件长度不超过2GB)

Apart from an IOException , what could cause the read method to not retrieve the entire file contents?除了IOException之外,还有什么可能导致read方法无法检索整个文件内容?

EDIT:编辑:

The idea of the code (and the goal of the OP of the question I answered) is to read the entire file into a chunk of memory in one swoop, that's why buffer_size = file_size .代码的想法(以及我回答的问题的 OP 的目标)是一口气将整个文件读入 memory 的一块,这就是为什么buffer_size = file_size

what could cause the read method to not retrieve the entire file contents?什么可能导致读取方法无法检索整个文件内容?

If, for example, the file is fragmented on the filesystem and the low-level implementation knows that it will have to wait for the HD to seek to the next fragment (which is something that takes a LOT of time relative to CPU operations), it would make sense for the read() call to return with part of the buffer unfilled to give the application the chance to already do something with the data it has recieved.例如,如果文件在文件系统上是碎片化的,并且低级实现知道它将必须等待 HD 寻找下一个碎片(相对于 CPU 操作而言,这需要很多时间), read()调用返回部分缓冲区未填充以使应用程序有机会已经对它收到的数据执行某些操作是有意义的。

Now I don't know whether any implementation actually works like that, but the point is that you must not rely on the buffer being filled, because it's not guaranteed by the API contract.现在我不知道是否有任何实现实际上是这样工作的,但关键是你不能依赖被填充的缓冲区,因为 API 合同不能保证它。

Apart from an IOException, what could cause the read method to not retrieve the entire file contents?除了 IOException,还有什么可能导致 read 方法无法检索整个文件内容?

In my own API implementation, and on my home rolled file-system I simply choose to fill half the buffer...... just kidding.在我自己的 API 实现中,在我的家庭滚动文件系统上,我只是选择填充一半的缓冲区......开玩笑。

My point is that even if I wasn't kidding, technically speaking it wouldn't be a bug.我的观点是,即使我不是在开玩笑,从技术上讲,它也不会是一个错误。 It is a matter of method contract.这是一个方法契约的问题。 This is the contract (documentation) in this case is:在这种情况下,合同(文档)是:

Reads up to b.length bytes of data from this input stream into an array of bytes.从此输入 stream 将最多b.length个字节的数据读取到字节数组中。

ie, it gives no guarantees for filling the buffer.即,它不保证填充缓冲区。

Depending on the API implementation, and perhaps on the file-system the read method may choose not to fill the buffer.根据 API 实现,也许在文件系统上, read方法可能会选择不填充缓冲区。 It's basically a question of what the contract of the method says.这基本上是方法的合同说什么的问题。


Bottom line: It probably works, but is not guaranteed to work.底线:可能有效,但不保证有效。

Well, first off you've made yourself a false dichotomy.好吧,首先你让自己成为一个错误的二分法。 One perfectly normal circumstance is that the buffer won't be filled because there aren't that many bytes left in the file.一种完全正常的情况是缓冲区不会被填充,因为文件中没有那么多字节。 That is not an IOException , but it doesn't mean the whole file's contents have not been read.这不是IOException ,但这并不意味着尚未读取整个文件的内容。

The spec says the method will either return -1 indicating end-of-stream or will block until at least one byte is read.规范说该方法将返回 -1 指示流结束或将阻塞直到至少读取一个字节。 Implementers of InputStream can optimize as they see fit (eg a TCP stream might return data as soon as the packet comes in regardless of the caller's choice of buffer size). InputStream的实现者可以在他们认为合适的情况下进行优化(例如,TCP stream 可能会在数据包进入时立即返回数据,而不管调用者选择的缓冲区大小)。 A FileInputStream might fill the buffer with one block's worth of data. FileInputStream可能会用一个块的数据填充缓冲区。 As the caller, you have no idea except that until the method returns -1 , you need to keep on reading.作为调用者,您不知道,除非方法返回-1 ,否则您需要继续阅读。

Edit编辑

In practice, with your example, the only circumstance I would see where the buffer wouldn't be filled (with a standard implementation) is if the file changed size after you allocated the buffer but before you started reading it.实际上,在您的示例中,我会看到缓冲区不会被填充的唯一情况(使用标准实现)是如果文件在您分配缓冲区之后但在您开始读取它之前更改了大小。 Since you haven't locked the file down this is a possibility.由于您尚未锁定文件,因此这是可能的。

People have talked about read on a FileInputStream as hypothetically not filling the buffer.人们已经讨论过FileInputStream上的读取假设没有填充缓冲区。 In fact it is a reality in some circumstances:事实上,在某些情况下这是一个现实

  • If you open a FileInputStream on a "/dev/tty" or a named pipe, then a read will only return you the data that is currently available.如果您在“/dev/tty”或命名的 pipe 上打开 FileInputStream,则read只会返回当前可用的数据。 Other device files may behave the same way.其他设备文件的行为方式可能相同。 (These files will probably return 0L as the file size though.) (这些文件可能会返回0L作为文件大小。)

  • A FUSE file system can be implemented to not completely fill the read buffer if the file system has been mounted with the direct_io option, or a file is opened with the corresponding flag.如果文件系统已使用direct_io选项挂载,或者使用相应标志打开文件,则可以实现FUSE文件系统以不完全填充读取缓冲区。

The above apply to Linux, but there could well be similar cases for other operating systems and/or Java implementations.以上适用于 Linux,但对于其他操作系统和/或 Java 实现也可能存在类似情况。 The bottom line is that the javadocs allow this behavior and you can get into trouble if your application assumes that it won't occur.底线是 javadocs允许这种行为,如果您的应用程序假定它不会发生,您可能会遇到麻烦。

There are 3rd party libraries that implement "read fully" behavior;有实现“完全读取”行为的 3rd 方库; eg Apache commons provides FileUtils.readFileToByteArray or IOUtils.toByteArray and similar methods.例如 Apache commons 提供了FileUtils.readFileToByteArrayIOUtils.toByteArray等类似方法。 If you want / need that behavior you should use one of those libraries, or implement it yourself.如果您想要/需要这种行为,您应该使用其中一个库,或者自己实现它。

It's not guaranteed to Fill the buffer.不保证填充缓冲区。

The file size may be smaller than the buffer, or the remainder of the file may be smaller than the buffer.文件大小可能小于缓冲区,或者文件的其余部分可能小于缓冲区。

Your question is self-contradictory.你的问题是自相矛盾的。 There is no guarantee that it will read the whole buffer, even if there are no imaginable circumstances in which it won't.不能保证它会读取整个缓冲区,即使没有可以想象的情况它不会。 There is no guarantee so you can't assume it.没有保证,所以你不能假设它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM