简体繁体 English

Java NIO的高效读写方法

[英]Efficient read/write approach with Java NIO

原文 2012-08-17 01:57:01 3 2 java/ nio

Let's say we have SocketChannel (in non-blocking mode) that registered with Selector for read interest. 假设我们有SocketChannel（处于非阻塞模式），该通道已向Selector注册以引起读者的兴趣。 Let's say after select() Selector tells us that this channel is ready for read and we have some ByteBuffer. 假设select（）选择器告诉我们此通道已准备好读取并且有一些ByteBuffer。 We want to read some bytes from our channel to this buffer (ByteBuffer is cleared before reading). 我们想从通道中读取一些字节到此缓冲区（在读取之前先清除ByteBuffer）。 For this we use channel's read() method that returns actual number of bytes read. 为此，我们使用通道的read（）方法，该方法返回读取的实际字节数。 Lets suppose that this number is positive after read from channel and also ByteBuffer's method hasRemaining() returns true. 假设从通道读取后此数字为正，并且ByteBuffer的方法hasRemaining（）返回true。 Is it practical in this situation to immediately try to read from same channel some more? 在这种情况下立即尝试从同一频道读取更多内容是否可行？ The same question for write(). 对于write（）同样的问题。 If write() returns positive value and not all contents of the buffer was sent, is it practical to immediately try again until write() returns zero? 如果write（）返回正值并且未发送缓冲区的所有内容，立即重试直到write（）返回零是否可行？

2 个解决方案

It all depends on the data rate at which data is arriving, and the latency requirements of your application. 这完全取决于数据到达的数据速率以及应用程序的延迟要求。 If you don't care about latency at all, you might get slightly higher bandwidth by delaying your read interest until you suspect enough data has arrived to fill your buffer. 如果您根本不关心延迟，则可以通过延迟读取兴趣直到怀疑有足够的数据填充缓冲区来获得更高的带宽。

You have to be careful, though. 不过，您必须要小心。 Delaying reads could force the kernel to buffer more data, possibly fill its buffer, and have to start dropping packets or otherwise engage some flow control. 延迟读取可能会迫使内核缓冲更多数据，可能会填充其缓冲区，并且不得不开始丢弃数据包或进行其他流控制。 That will more than kill any benefits from the last paragraph. 这将杀死上一段的任何好处。

So generally, you want to read as much as you can, as early as you can. 因此，通常来说，您希望尽可能早地阅读。 The benefits for batching reads are minor at best, and the potential pitfalls can be major. 批量读取的好处充其量是次要的，潜在的陷阱可能很大。 And keep in mind that the fact that you're seeing non-full reads means you're processing the data faster than it is coming in. In other words, you're in a state where you have CPU to burn, so the extra overhead of smaller reads is essentially free. 并且请记住，您看到非完全读取的事实意味着您处理数据的速度快于传入数据的速度。换句话说，您处于需要刻录CPU的状态，因此较小读取的开销基本上是免费的。

If you get a short read result, there is no more data to read without blocking, so you must not read again until there is. 如果读取结果很短，那么在没有阻塞的情况下将没有更多的数据可读取，因此您必须等到再读取一次。 Otherwise the next read will almost certainly return zero or -1. 否则，下一次读取几乎肯定会返回零或-1。

If the read fills the buffer, it might make sense from the point of view of that one connection to keep reading until it returns <= 0, but you are stealing cycles from the other channels. 如果读取已填满缓冲区，那么从该连接的角度来看，保持读取直到它返回<= 0可能是有意义的，但是您正在从其他通道窃取周期。 You need to consider fairness as well. 您还需要考虑公平性。 In general you should probably do one read and keep iterating over the selected keys. 通常，您应该阅读一遍，并不断迭代选定的键。 If there's more data there the select will tell you next time. 如果有更多数据，选择项将在下次告诉您。

Use big buffers. 使用大缓冲区。

This also means that it's wrong to clear the buffer before each read. 这也意味着在每次读取之前清除缓冲区是错误的。 You should get the data out with a flip/get/compact cycle, then the buffer is ready to read again and you don't risk losing data. 您应该以翻转/获取/压缩周期取出数据，然后缓冲区准备好再次读取，并且您不会冒丢失数据的风险。 This in turn implies that you need a buffer per connection. 这又意味着每个连接都需要一个缓冲区。