Erlang服务器，Java客户端-TCP消息是否被拆分？

Question

As the title says, I have a server written in Erlang, a client written in Java and they are communicating through TCP. 如标题所示，我有一个用Erlang编写的服务器，一个用Java编写的客户端，它们通过TCP进行通信。 The problem that I am facing is the fact that gen_tcp:recv apparently has no knowledge of when a "complete" message from the client has been received, and is therefore "splitting" it up in multiple messages. 我面临的问题是gen_tcp：recv显然不知道何时从客户端接收到“完整”消息，因此正在将其“拆分”为多个消息。

This is an example of what I'm doing (Incomplete code, trying to keep it to only the relevant parts): 这是我正在做的一个示例（不完整的代码，试图将其仅保留在相关部分中）：

Code 码

Erlang server Erlang服务器

-module(server).
-export([start/1]).

-define(TCP_OPTIONS, [list, {packet, 0}, {active, false}, {reuseaddr, true}].

start(Port) ->
   {ok, ListenSocket} = gen_tcp:listen(Port, ?TCP_OPTIONS),
   accept(ListenSocket).

accept(ListenSocket) ->
    {ok, Socket} = gen_tcp:accept(ListenSocket),
    spawn(fun() -> loop(Socket) end),
    accept(ListenSocket).

loop(Socket) ->
    case gen_tcp:recv(Socket, 0) of
        {ok, Data} ->
            io:format("Recieved: ~s~n", [Data]),
            loop(Socket);
        {error, closed} ->
            ok
    end.

Java client Java客户端

public class Client {
    public static void main(String[] args) {
        Socket connection = new Socket("localhost", Port);
        DataOutputStream output = new DataOutputStream(connection.getOutputStream());
        Scanner sc = new Scanner(System.in);

        while(true) {
            output.writeBytes(sc.nextLine());
        }
    }
}

Result 结果

Client 客户

Hello!

Server 服务器

Received: H
Received: el
Received: lo!

I have been searching around and if I understand it correctly, TCP has no knowledge of the size of messages, and you need to manually set some kind of delimiter. 我一直在搜索，如果我对它的理解正确，TCP不知道消息的大小，因此您需要手动设置某种分隔符。

What I don't get though, is that the messages never seem to split up if I write the client in Erlang instead, like this: 我没有得到的是，如果我改为使用Erlang编写客户端，则消息似乎永远不会分裂，就像这样：

Erlang client Erlang客户端

-module(client).
-export([start/1]).

start(Port) ->
    {ok, Socket} = gen_tcp:connect({127,0,0,1}, Port, []),
    loop(Socket).

loop(Socket) ->
    gen_tcp:send(Socket, io:get_line("> ")),
    loop(Socket).

Result 结果

Client 客户

Hello!

Server 服务器

Received: Hello!

This makes me wonder if it is something that can be fixed on the Java side? 这使我想知道它是否可以在Java端固定？ I have tried several combinations of different output streams, write methods and socket settings on the server side, but nothing solves the problem. 我已经尝试了服务器端不同输出流，写入方法和套接字设置的几种组合，但是没有任何方法可以解决问题。

Also, there are loads of Erlang (chat) server examples around the net where they don't do any delimiter things, although those are often written in Erlang on both ends. 另外，网络上有很多Erlang（聊天）服务器示例，它们不做任何定界符，尽管它们通常都是用Erlang编写的。 Nevertheless, they seem to assume that the messages are received just like they are sent. 但是，他们似乎假定已像发送消息一样接收消息。 Is that just bad practice, or is there some hidden information about message length when both the client and server are written in Erlang? 这只是不好的做法，还是用Erlang编写客户端和服务器时是否存在一些有关消息长度的隐藏信息？

If delimiter checks are necessary, I am surprised I can't find much information on the subject. 如果必须进行定界符检查，我很惊讶我找不到关于此主题的太多信息。 How can it be done in a practical way? 如何以实际的方式完成？

Thanks in advance! 提前致谢！

Answer 1

This makes me wonder if it is something that can be fixed on the Java side? 这使我想知道它是否可以在Java端固定？

No, absolutely not. 不，绝对不是。 Regardless of why you don't happen to see the problem with an Erlang client, if you aren't putting any sort of "message boundary" indication into the protocol, you will not be able to reliably detect whole messages. 不管为什么您没有碰巧看到Erlang客户端的问题，如果您没有在协议中添加任何类型的“消息边界”指示，您将无法可靠地检测到整个消息。 I strongly suspect that if you send a very large message with the Erlang client, you'll still see split messages. 我强烈怀疑，如果使用Erlang客户端发送非常大的消息，仍然会看到拆分消息。

You should either: 您应该：

Use some sort of "end of message" sequence, eg a 0 byte if that wouldn't otherwise come up in your messages. 使用某种“消息结尾”序列，例如0字节（如果否则消息中不会出现）。
Prefix each message with the length of the message. 给每个消息加上消息长度的前缀。

Aside from that, you aren't clearly differentiating between bytes and text at the moment. 除此之外，您现在还没有明显区分字节和文本。 Your Java client is currently silently ignoring the top 8 bits of each char , for example. 例如，您的Java客户端当前无声地忽略每个char的高8位。 Rather than using DataOutputStream , I would suggest just using OutputStream , and then for each message: 而不是使用DataOutputStream ，我只想用建议OutputStream ，然后为每个消息：

Encode it as a byte array using a specific encoding, eg 使用特定编码将其编码为字节数组，例如
```
 byte[] encodedText = text.getBytes(StandardCharsets.UTF_8); 
```
Write a length prefix to the stream (possibly in a 7-bit-encoded integer, or maybe just as a fixed width, eg 4 bytes). 向流中写入一个长度前缀（可能是7位编码的整数，或者可能只是一个固定的宽度，例如4个字节）。 (Actually, sticking with DataOutputStream would make this bit simpler.) （实际上，坚持使用DataOutputStream会使这一点变得更简单。）
Write the data 写数据

On the server side, you should "read a message" by reading the length, then reading the specified number of bytes. 在服务器端，您应该通过读取长度，然后读取指定的字节数来“读取消息”。

You can't get around the fact that TCP is a stream-based protocol. 您无法回避TCP是基于流的协议这一事实。 If you want a message-based protocol, you really do have to put that on top yourself. 如果您想要基于消息的协议，则确实必须将其放在首位。 (I'm sure there are helpful libraries to do this, of course - but you shouldn't just leave it up to TCP and hope.) （当然，我敢肯定有有用的库可以做到这一点-但您不应该只将它留给TCP和希望。）

Answer 2

You need to define a protocol between your server and your client to split the TCP stream into messages. 您需要在服务器和客户端之间定义协议，以将TCP流拆分为消息。 TCP stream is divided in packets, but there is no guarantee that these match your calls to send/write or recv/read. TCP流分为数据包，但不能保证这些数据包与您的发送/写入或接收/读取调用相匹配。

A simple and robust solution is to prefix all messages with a length. 一个简单而强大的解决方案是为所有消息添加长度。 Erlang can do this transparently with {packet, 1|2|4} option, where the prefix is encoded on 1, 2 or 4 bytes. Erlang可以使用{packet, 1|2|4}选项透明地执行此操作，其中前缀被编码为1、2或4个字节。 You will have to perform the encoding on the Java side. 您将必须在Java端执行编码。 If you opt for 2 or 4 bytes, please be aware that the length should be encoded in big-endian format, the same byte-order used by DataOutputStream.outputShort(int) and DataOutputStream.outputInt(int) java methods. 如果选择2或4个字节，请注意该长度应以big-endian格式编码，与DataOutputStream.outputShort(int)和DataOutputStream.outputInt(int) java方法所使用的字节顺序相同。

However, it seems from your implementations that you do have an implicit protocol: you want the server to process each line separately. 但是，从您的实现看来，您确实有一个隐式协议：您希望服务器单独处理每一行。

This is fortunately also handled transparently by Erlang. 幸运的是，Erlang也对此进行了透明处理。 You simply need to pass {packet, line} option. 您只需要传递{packet, line}选项。 You might need to adjust the receive buffer, however, as lines longer that this buffer will be truncated. 但是，您可能需要调整接收缓冲区，因为该缓冲区将被截断更长的行。 This can be done with {recbuf, N} option. 可以使用{recbuf, N}选项来完成。

So just redefining your options should do what you want. 因此，只需重新定义选项即可完成您想要的操作。

-define(MAX_LINE_SIZE, 512).
-define(TCP_OPTIONS, [list, {packet, line}, {active, false}, {reuseaddr, true}, {recbuf, ?MAX_LINE_SIZE}].

Answer 3

As Jon said, TCP is a streaming protocol and has no concept of a message in the sense that you are looking for. 正如乔恩所说，TCP是一种流协议，在您要寻找的意义上没有消息的概念。 It is often broken up based on your rate of reading, kernerl buffer size, MTU of network, etc... There are no guarantees that you don't get your data 1 byte at a time. 通常会根据您的读取速率，kernerl缓冲区大小，网络的MTU等对这些数据进行分解。无法保证您一次不会获得1个字节的数据。

The easiest change to make to your app to get what you want is to change the erlang server side's TCP_OPTIONS {packet,0} to {packet,4} 对您的应用进行最简单的更改以获得所需的内容是将erlang服务器端的TCP_OPTIONS {packet，0}更改为{packet，4}

and change the java writer code to: 并将Java writer代码更改为：

while(true) {
   byte[] data = sc.nextLine().getBytes(StandardCharsets.UTF_8); // or leave out the UTF_8 for default platform encoding
   output.writeInt(data.length);
   output.write(data,0,data.length);
}

you should find that you receive exactly the right message. 您应该发现自己收到的信息正确无误。

You also should add {packet,4} to the erlang client if you make this change on the server side as the server now expects a 4 byte header indicating the size of the message. 如果您在服务器端进行此更改，则还应该将{packet，4}添加到erlang客户端，因为服务器现在期望一个4字节的标头来指示消息的大小。

note: the {packet,N} syntax is transparent in erlang code, the client doesn't need to send the int, and the server doesn't see the int. 注意：{packet，N}语法在erlang代码中是透明的，客户端不需要发送int，服务器也看不到int。 Java doesn't have the equivalent of size framing in the standard library, so you have to write the int size yourself. Java在标准库中没有等效的大小框架，因此您必须自己编写int大小。

Erlang服务器，Java客户端-TCP消息是否被拆分？

问题描述

Code 码

Erlang server Erlang服务器

Java client Java客户端

Result 结果

Client 客户

Server 服务器

Erlang client Erlang客户端

Result 结果

Client 客户

Server 服务器

3 个解决方案

解决方案1
4 2014-05-18 16:32:27

解决方案2
3 已采纳 2014-05-18 17:32:46

解决方案3
1 2014-05-18 16:51:49

Erlang服务器，Java客户端-TCP消息是否被拆分？

问题描述

Code 码

Erlang server Erlang服务器

Java client Java客户端

Result 结果

Client 客户

Server 服务器

Erlang client Erlang客户端

Result 结果

Client 客户

Server 服务器

3 个解决方案

解决方案1 4 2014-05-18 16:32:27

解决方案2 3 已采纳 2014-05-18 17:32:46

解决方案3 1 2014-05-18 16:51:49

解决方案1
4 2014-05-18 16:32:27

解决方案2
3 已采纳 2014-05-18 17:32:46

解决方案3
1 2014-05-18 16:51:49