简体   繁体   English

您如何决定用于 InputStream.read() 的 byte[] 大小?

[英]How do you decide what byte[] size to use for InputStream.read()?

When reading from InputStreams, how do you decide what size to use for the byte[]?从 InputStreams 读取时,如何决定 byte[] 使用的大小?

int nRead;
byte[] data = new byte[16384]; // <-- this number is the one I'm wondering about

while ((nRead = is.read(data, 0, data.length)) != -1) {
  ...do something..
}

When do you use a small one vs a large one?你什么时候用小号和大号? What are the differences?有什么区别? Does the number want to be in increments of 1024?数字是否要以 1024 为增量? Does it make a difference if it is an InputStream from the network vs the disk?如果它是来自网络的 InputStream 与来自磁盘的输入流,它会有所不同吗?

Thanks much, I can't seem to find a clear answer elsewhere.非常感谢,我似乎无法在其他地方找到明确的答案。

Most people use powers of 2 for the size.大多数人使用 2 的幂来表示大小。 If the buffer is at least 512 bytes, it doesn't make much difference ( < 20% )如果缓冲区至少为 512 字节,则没有太大区别( < 20% )

For network the optimal size can be 2 KB to 8 KB (The underlying packet size is typically up to ~1.5 KB) For disk access, the fastest size can be 8K to 64 KB.对于网络,最佳大小可以是 2 KB 到 8 KB(底层数据包大小通常高达 ~1.5 KB)对于磁盘访问,最快的大小可以是 8K 到 64 KB。 If you use 8K or 16K you won't have a problem.如果您使用 8K 或 16K,则不会有问题。

Note for network downloads, you are likely to find you usually don't use the whole buffer.注意网络下载,您可能会发现您通常不使用整个缓冲区。 Wasting a few KB doesn't matter much for 99% of use cases.对于 99% 的用例来说,浪费几 KB 并不重要。

In that situation, I always use a reasonable power of 2, somewhere in the range of 2K to 16K.在那种情况下,我总是使用 2 的合理幂,在 2K 到 16K 的范围内。 In general, different InputStreams will have different optimal values, but there is no easy way to determine the value.一般来说,不同的 InputStreams 会有不同的最优值,但是没有简单的方法来确定这个值。

In order to determine the optimal value, you'd need to understand more about the exact type of InputStream you are dealing with, as well as things like the specifications of the hardware that are servicing the InputStream.为了确定最佳值,您需要更多地了解您正在处理的 InputStream 的确切类型,以及为 InputStream 提供服务的硬件规格等内容。

Worrying about this is probably a case of premature optimization.担心这个可能是过早优化的情况。

It mostly depends on how much memory you have and how much data you expect to read.这主要取决于您拥有多少内存以及您希望读取多少数据。 You don't want to block too often, so consider BenCole 's answer;您不想太频繁地阻止,因此请考虑BenCole的回答; on the other hand, you don't want to process a small chunk of data if your processing is slower than the actual reading.另一方面,如果您的处理速度比实际读取速度慢,您不希望处理一小块数据。

I personally try to use a library and offload the task of choosing a buffer size to library authors.我个人尝试使用库并将选择缓冲区大小的任务卸载给库作者。 After that, I promise myself never read the library code, because it makes me mad.在那之后,我向自己保证不再阅读库代码,因为这让我很生气。

I'd also say that, if reading from an InputStream (not from a ReadableByteChannel like a FileChannel or a SocketChannel ), you should not care, as long as you're wrapping it in a BufferedInputStream with a "correct" buffer size: the internal buffer will take care of the reads for you so you can focus on just reading the pieces you need.我还要说的是,如果从InputStream读取(而不是从ReadableByteChannelFileChannelSocketChannel ),你不应该关心,只要你把它包装在一个具有“正确”缓冲区大小的BufferedInputStream :内部缓冲区将为您处理读取,因此您可以专注于读取您需要的部分。

In that case, the buffer size is probably what you're looking for and I would redirect you to @Peter Lawrey's answer : 2-8KB when the data is accessed from network, or 32-64KB when it's from hard drive (a "chunk" of disk).在这种情况下,缓冲区大小可能就是您要查找的大小,我会将您重定向到@Peter Lawrey 的回答:从网络访问数据时为 2-8KB,从硬盘访问时为 32-64KB(“块"的磁盘)。

When reading from a ByteChannel though, you'll have to do the buffering yourself through a ByteBuffer that you can allocate with that value.但是,当从ByteChannel读取时,您必须自己通过ByteBuffer ,您可以使用该值分配该缓冲区。

By using the available() method in the InputStream class.通过使用InputStream类中的available()方法。 From the Javadoc:来自 Javadoc:

Returns the number of bytes that can be read (or skipped over) from this input stream without blocking by the next caller of a method for this input stream.返回可以从此输入流读取(或跳过)的字节数,而不会被此输入流的方法的下一个调用者阻塞。 The next caller might be the same thread or or another thread.下一个调用者可能是同一个线程或另一个线程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM