Erlang server, Java client - TCP messages get split?

Question

As the title says, I have a server written in Erlang, a client written in Java and they are communicating through TCP. The problem that I am facing is the fact that gen_tcp:recv apparently has no knowledge of when a "complete" message from the client has been received, and is therefore "splitting" it up in multiple messages.

This is an example of what I'm doing (Incomplete code, trying to keep it to only the relevant parts):

Code

Erlang server

-module(server).
-export([start/1]).

-define(TCP_OPTIONS, [list, {packet, 0}, {active, false}, {reuseaddr, true}].

start(Port) ->
   {ok, ListenSocket} = gen_tcp:listen(Port, ?TCP_OPTIONS),
   accept(ListenSocket).

accept(ListenSocket) ->
    {ok, Socket} = gen_tcp:accept(ListenSocket),
    spawn(fun() -> loop(Socket) end),
    accept(ListenSocket).

loop(Socket) ->
    case gen_tcp:recv(Socket, 0) of
        {ok, Data} ->
            io:format("Recieved: ~s~n", [Data]),
            loop(Socket);
        {error, closed} ->
            ok
    end.

Java client

public class Client {
    public static void main(String[] args) {
        Socket connection = new Socket("localhost", Port);
        DataOutputStream output = new DataOutputStream(connection.getOutputStream());
        Scanner sc = new Scanner(System.in);

        while(true) {
            output.writeBytes(sc.nextLine());
        }
    }
}

Result

Client

Hello!

Server

Received: H
Received: el
Received: lo!

I have been searching around and if I understand it correctly, TCP has no knowledge of the size of messages, and you need to manually set some kind of delimiter.

What I don't get though, is that the messages never seem to split up if I write the client in Erlang instead, like this:

Erlang client

-module(client).
-export([start/1]).

start(Port) ->
    {ok, Socket} = gen_tcp:connect({127,0,0,1}, Port, []),
    loop(Socket).

loop(Socket) ->
    gen_tcp:send(Socket, io:get_line("> ")),
    loop(Socket).

Result

Client

Hello!

Server

Received: Hello!

This makes me wonder if it is something that can be fixed on the Java side? I have tried several combinations of different output streams, write methods and socket settings on the server side, but nothing solves the problem.

Also, there are loads of Erlang (chat) server examples around the net where they don't do any delimiter things, although those are often written in Erlang on both ends. Nevertheless, they seem to assume that the messages are received just like they are sent. Is that just bad practice, or is there some hidden information about message length when both the client and server are written in Erlang?

If delimiter checks are necessary, I am surprised I can't find much information on the subject. How can it be done in a practical way?

Thanks in advance!

Answer 1

This makes me wonder if it is something that can be fixed on the Java side?

No, absolutely not. Regardless of why you don't happen to see the problem with an Erlang client, if you aren't putting any sort of "message boundary" indication into the protocol, you will not be able to reliably detect whole messages. I strongly suspect that if you send a very large message with the Erlang client, you'll still see split messages.

You should either:

Use some sort of "end of message" sequence, eg a 0 byte if that wouldn't otherwise come up in your messages.
Prefix each message with the length of the message.

Aside from that, you aren't clearly differentiating between bytes and text at the moment. Your Java client is currently silently ignoring the top 8 bits of each char , for example. Rather than using DataOutputStream , I would suggest just using OutputStream , and then for each message:

Encode it as a byte array using a specific encoding, eg

 byte[] encodedText = text.getBytes(StandardCharsets.UTF_8);

Write a length prefix to the stream (possibly in a 7-bit-encoded integer, or maybe just as a fixed width, eg 4 bytes). (Actually, sticking with DataOutputStream would make this bit simpler.)
Write the data

On the server side, you should "read a message" by reading the length, then reading the specified number of bytes.

You can't get around the fact that TCP is a stream-based protocol. If you want a message-based protocol, you really do have to put that on top yourself. (I'm sure there are helpful libraries to do this, of course - but you shouldn't just leave it up to TCP and hope.)

Answer 2

You need to define a protocol between your server and your client to split the TCP stream into messages. TCP stream is divided in packets, but there is no guarantee that these match your calls to send/write or recv/read.

A simple and robust solution is to prefix all messages with a length. Erlang can do this transparently with {packet, 1|2|4} option, where the prefix is encoded on 1, 2 or 4 bytes. You will have to perform the encoding on the Java side. If you opt for 2 or 4 bytes, please be aware that the length should be encoded in big-endian format, the same byte-order used by DataOutputStream.outputShort(int) and DataOutputStream.outputInt(int) java methods.

However, it seems from your implementations that you do have an implicit protocol: you want the server to process each line separately.

This is fortunately also handled transparently by Erlang. You simply need to pass {packet, line} option. You might need to adjust the receive buffer, however, as lines longer that this buffer will be truncated. This can be done with {recbuf, N} option.

So just redefining your options should do what you want.

-define(MAX_LINE_SIZE, 512).
-define(TCP_OPTIONS, [list, {packet, line}, {active, false}, {reuseaddr, true}, {recbuf, ?MAX_LINE_SIZE}].

Answer 3

As Jon said, TCP is a streaming protocol and has no concept of a message in the sense that you are looking for. It is often broken up based on your rate of reading, kernerl buffer size, MTU of network, etc... There are no guarantees that you don't get your data 1 byte at a time.

The easiest change to make to your app to get what you want is to change the erlang server side's TCP_OPTIONS {packet,0} to {packet,4}

and change the java writer code to:

while(true) {
   byte[] data = sc.nextLine().getBytes(StandardCharsets.UTF_8); // or leave out the UTF_8 for default platform encoding
   output.writeInt(data.length);
   output.write(data,0,data.length);
}

you should find that you receive exactly the right message.

You also should add {packet,4} to the erlang client if you make this change on the server side as the server now expects a 4 byte header indicating the size of the message.

note: the {packet,N} syntax is transparent in erlang code, the client doesn't need to send the int, and the server doesn't see the int. Java doesn't have the equivalent of size framing in the standard library, so you have to write the int size yourself.

Erlang server, Java client - TCP messages get split?

Question

Code

Erlang server

Java client

Result

Client

Server

Erlang client

Result

Client

Server

3 answers

solution1
4 2014-05-18 16:32:27

solution2
3 ACCPTED 2014-05-18 17:32:46

solution3
1 2014-05-18 16:51:49

Erlang server, Java client - TCP messages get split?

Question

Code

Erlang server

Java client

Result

Client

Server

Erlang client

Result

Client

Server

3 answers

solution1 4 2014-05-18 16:32:27

solution2 3 ACCPTED 2014-05-18 17:32:46

solution3 1 2014-05-18 16:51:49

solution1
4 2014-05-18 16:32:27

solution2
3 ACCPTED 2014-05-18 17:32:46

solution3
1 2014-05-18 16:51:49