简体   繁体   English

Java InputStream会自动拆分套接字消息

[英]Java InputStream automatically splits socket messages

I have a really strange behavior in Java and I can't tell whether this happens on purpose or by chance. 我在Java中有一个非常奇怪的行为,我无法分辨这是出于故意还是偶然。

I do have a Socket Connection to Server that sends me a response to a request. 我有一个服务器的套接字连接,它向我发送一个请求的响应。 I am reading this response from the Socket with the following loop, which is encapsulated in a try-with-resource. 我正在使用以下循环从Socket读取此响应,该循环封装在try-with-resource中。

BufferedInputStream remoteInput = new BufferedInputStream(remoteSocket.getInputStream())
final byte[] response = new byte[512];
int bytes_read;
while ((bytes_read = remoteInput.read(response,0,response.length)) != -1) {
    // Messageparsingstuff which does not affect the behaviour
}

According to my understanding the "read" Method fills as many bytes as possible into the byte Array. 根据我的理解,“读取”方法将尽可能多的字节填充到字节数组中。 The limiting factors are either the amount of received bytes or the size of the array. 限制因素是接收的字节数或数组的大小。

Unfortunately, this is not whats happening: the protocol I'm transmitting answers my request with several smaller answers which are sent one after another over the same socket connection. 不幸的是,这不是最新发生的事情:我正在传输的协议用几个较小的答案回答我的请求,这些答案是通过同一个套接字连接一个接一个地发送的。

In my case the "read" Method always returns with exactly one of those smaller answers in the array. 在我的例子中,“read”方法总是返回数组中那些较小答案中的一个。 The length of the answers varies but the 512 Byte that fit into the array are always enough. 答案的长度各不相同,但适合阵列的512字节总是足够的。 Which means my array always contains only one message and the rest/unneeded part of the array remains untouched. 这意味着我的数组总是只包含一条消息,而数组的其余/不需要的部分保持不变。

If I intentionally define the byte-array smaller than my messages it will return several completely filled arrays and one last array that contains the rest of the bytes until the message is complete. 如果我故意定义比我的消息小的字节数组,它将返回几个完全填充的数组和一个包含剩余字节的最后一个数组,直到消息完成。

(A 100 byte answer with an array length of 30 returns three completely filled arrays and one with only 10 bytes used) (一个100字节的答案,数组长度为30,返回三个完全填充的数组,一个只使用10个字节)

The InputStream or a socket connection in general shouldn't interpret the transmitted bytes in any way which is why I am very confused right now. 一般来说,InputStream或套接字连接不应该以任何方式解释传输的字节,这就是我现在非常困惑的原因。 My program is not aware of the used protocol in any way. 我的程序没有以任何方式知道使用的协议。 In fact, my entire program is only this loop and the stuff you need to establish a socket connection. 事实上,我的整个程序只是这个循环以及建立套接字连接所需的东西。

If I can rely on this behavior it would make parsing the response extremely easy but since I do not know what causes this behavior in the first place I don't know whether I can count on it. 如果我可以依赖这种行为,那么解析响应非常容易,但由于我不知道是什么原因导致这种行为,我不知道我是否可以指望它。

The protocol I'm transmitting is LDAP but since my program is completely unaware of that, that shouldn't matter. 我正在传输的协议是LDAP,但由于我的程序完全没有意识到这一点,这应该不重要。

According to my understanding the "read" Method fills as many bytes as possible into the byte Array. 根据我的理解,“读取”方法将尽可能多的字节填充到字节数组中。

Your understanding is incorrect. 你的理解是不正确的。 The whole point of that method returning the "number of bytes read" is: it might return any number. 返回“读取的字节数”的方法的重点是:它可能返回任何数字。 And to be precise: when talking about a blocking read - when the method returns, it has read something ; 确切地说:当谈论阻塞读取时 - 当方法返回时,它已经读取了一些东西 ; thus it will return a number >= 1. 因此它将返回一个> = 1的数字。

In other words: you should never every rely on read() reading a specific amount of bytes. 换句话说:你永远不应该依赖read()读取特定数量的字节。 You always always always check the returned numbers; 总是总是检查返回的数字; and if you are waiting for a certain value to be reached, then you have to do something about that in your code (like buffering again; until you got "enough" bytes in your own buffer to proceed). 如果你正在等待达到某个值,那么必须在代码中做一些事情(比如再次缓冲;直到你在自己的缓冲区中得到“足够”的字节才能继续)。

Thing is: there is a whole, huge stack of elements involved in such read operations. 事情是:在这种读取操作中涉及大量的元素。 Network, operating system, jvm. 网络,操作系统,jvm。 You can't control what exactly happens; 你无法控制到底发生了什么; and thus you can not and should not build any implicit assumptions into your code like this. 因此,您不能也不应该像这样在代码中构建任何隐含的假设。

While you might see this behaviour on a given machine, esp over loopback, once you start using real networks and use different hardware this can change. 虽然您可能会在给定计算机上看到此行为,尤其是环回,但一旦您开始使用真实网络并使用不同的硬件,这可能会发生变化。

If you send messages with enough of a delay, and read them fast enough, you will see one message at a time. 如果您发送的消息有足够的延迟,并且足够快地读取它们,您将一次看到一条消息。 However, if writing messages are sent close enough or your reader is delayed in any way, you can get multiple messages sent at once. 但是,如果写入消息足够接近或者您的阅读器以任何方式延迟,您可以立即发送多条消息。

Also if you message is large enough eg around the MTU or more, a single message can be broken up even if your buffer is more than large enough. 此外,如果您的消息足够大,例如在MTU或更多周围,即使您的缓冲区足够大,也可以分解单条消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM