简体繁体 English

Java：在 TCP/IP 消息中包含标签的常用方法是什么

[英]Java: what is the common way to include tags in a TCP/IP message

原文 2015-05-01 14:02:30 9 1 java/ tcp/ tags/ ascii

I designed a protocol for sending TCP/IP messages between peers in a peer-to-peer system.我设计了一个协议，用于在对等系统中的对等方之间发送 TCP/IP 消息。 A message is a byte array in which the first byte indicates the wanted operation.消息是一个字节数组，其中第一个字节表示所需的操作。 Then follow the arguments.然后按照论证进行。

To reconstruct the arguments I read byte per byte.为了重建参数，我每字节读取一个字节。 Because it is possible that there are multiple arguments I have to put tags in between them (I call it an end of argument byte).因为可能有多个参数，我必须在它们之间放置标签（我称之为参数字节结束）。

What is the common way for including such tags?包含此类标签的常用方法是什么？

Currently I use 1 byte for representing the end of argument tag (number 17).目前我使用 1 个字节来表示参数标记的结尾（编号 17）。 It is important that I use a/multiple byte(s) that will never be contained in an argument (else it will be interpreted as an end of argument byte).重要的是我使用一个/多个字节，它永远不会包含在参数中（否则它将被解释为参数字节的结尾）。

First I thought to use number 17 as end of argument byte as that is the ASCII value for "device controller 1".首先，我想使用数字 17 作为参数字节的结尾，因为这是“设备控制器 1”的 ASCII 值。 But now I'm not 100% sure that it will never be contained in an argument.但现在我不能 100% 确定它永远不会包含在论证中。 Arguments are files (any possible file, for example : txt, doc but also for example an image or ...).参数是文件（任何可能的文件，例如：txt、doc 以及例如图像或...）。

1 个解决方案

You cannot insert separators without making any assumptions about data that will be residing between them.您不能在不对将驻留在它们之间的数据做出任何假设的情况下插入分隔符。 If your protocol is to be generic as possible then it should support byte arrays type which can potentially conflict with your separator bits.如果您的协议要尽可能通用，那么它应该支持可能与您的分隔符位冲突的字节数组类型。

I suggest to take the same way as the typical binary serialization formats out there (eg AVRO), but in your case as you don't have any kind of schema definition, you will need to adjust it a bit to have a type information inside like Thrift or Protobuf do, but without schema.我建议采用与典型的二进制序列化格式相同的方式（例如 AVRO），但在您的情况下，由于您没有任何类型的架构定义，您需要稍微调整它以在其中包含类型信息像 Thrift 或 Protobuf 那样，但没有模式。

Try the following format:尝试以下格式：

[ type1 ][ length1 ][ data ][ type2 ][ length2 ][ data2 ]...[ lengthN ][ dataN ] [类型1] [长度1] [数据] [2型] [长度2] [DATA2] ... [lengthN] [DATAN]

Size of type tag can be 4 bits which gives you 16 types to be assigned, you can say type 1 is String , 2 - Image JPG , 3 -> Number long , it depends on your needs.类型标签的大小可以是4 位，这给你16 种类型的分配，你可以说类型 1是String ，2 - Image JPG ，3 - > Number long ，这取决于你的需要。

Length can be one byte which gives you ability to indicate length from 1 - 256 , if you want larger length you can say that if length == 256 then there is a continuation of the sequence and proceed to read the same type until you find length < 256 which will be the last for this type.长度可以是一个字节，它使您能够指示从1 - 256 的长度，如果您想要更大的长度，您可以说if length == 256那么有一个序列的延续并继续读取相同的类型，直到找到length < 256这将是此类型的最后一个。

Pros of this method is that you always know where is the service bytes and where is the actual data .这种方法的优点是您始终知道服务字节在哪里以及实际数据在哪里。 So rather than indicating the end of the argument you will be indicating the beginning + length.因此，不是指示参数的结尾，而是指示开头 + 长度。

Later you can include schema tag if you will be able to categorize your messages, this will give you the ability to strip the type information of the messages and leave only the schema id and the length tags which can potentially improve the performance.稍后，如果您能够对消息进行分类，则可以包含架构标记，这将使您能够剥离消息的类型信息，只留下架构 ID和长度标记，这可能会提高性能。