简体   繁体   English

Google ProtoBuf序列化/反序列化

[英]Google ProtoBuf serialization / deserialization

I am reading Google Protocol Buffers . 我正在阅读Google Protocol Buffers I want to know Can I Serialize C++ object and send it on the wire to Java server and Deserialize there in java and introspect the fields. 我想知道我可以Serialize C++ object并将其在线发送到Java服务器并在java Deserialize那里的字段。

Rather I want to send objects from any language to Java Server. 而是我想将对象从任何语言发送到Java Server。 and deserialize it there. 并在那里反序列化。

Assume following is my .proto file 假设以下是我的.proto文件

message Person {
  required int32 id = 1;
  required string name = 2;
  optional string email = 3;
}

I ran protoc on this and created a C++ object. 我对此进行了protoc ,并创建了一个C++对象。 Basically Now i want to send the serialized stream to java server. 基本上现在我想将序列化的流发送到java服务器。

Here on java side can I deserialized the stream , so that I can find out there are 3 fields in the stream and its respective name, type, and value 在Java方面,我可以在此deserialized化流,以便可以发现流中有3字段及其各自的name, type, and value

Here on java side can I deserialized the stream , so that I can find out there are 3 fields in the stream and its respective name , type , and value 在Java方面,我可以在此反序列化流,以便可以发现流中有3个字段及其各自的nametypevalue

You will need to know the schema in advance. 您将需要事先了解架构。 Firstly, protobuf does not transmit names ; 首先,protobuf 不传输名称 all it uses as identifiers is the numeric key ( 1 , 2 and 3 in your example) of each field. 所有它使用作为标识符是数字键( 123的每个字段的在你的例子)。 Secondly, it does not explicitly specify the type; 其次,它没有明确指定类型。 there are only a very few wire-types in protobuf (varint, 32-bit, 64-bit, length-prefix, group); protobuf中只有很少的线型(varint,32位,64位,长度前缀,组); actual data types are mapped onto those, but you cannot unambiguously decode data without the schema 实际的数据类型已映射到这些数据类型,但是如果没有模式,您将不能明确地解码数据

  • varint is "some form of integer", but could be signed, unsigned or "zigzag" (which allows negative numbers of small magnitude to be cheaply encoded), and could be intended to represent any width of data (64 bit, 32 bit, etc) varint是“某种形式的整数”,但可以是有符号的,无符号的或“之字形”(允许小数值的负数便宜地编码),并且可以表示任何宽度的数据(64位,32位,等等)
  • 32-bit could be an integer, but could be signed or unsigned - or it could be a 32-bit floating-point number 32位可以是整数,但可以是有符号或无符号的,也可以是32位浮点数
  • 64-bit could be an integer, but could be signed or unsigned - or it could be a 64-bit floating-point number 64位可以是整数,但可以是有符号或无符号的,也可以是64位浮点数
  • length-prefix could be a UTF-8 string, a sequence or raw bytes (without any particular meaning), a "packed" set of repeated values of some primitive type (integer, floating point, etc), or could be a structured sub-message in protobuf format length-prefix可以是UTF-8字符串,序列或原始字节(无特殊含义),某种原始类型(整数,浮点等)的“打包” repeated值集,也可以是结构化子-protobuf格式的消息
  • groups - hoorah! 团体-哇! this is always unambigous! 这始终是明确的! this can only mean one thing; 这只能意味着一件事; but that one thing is largely deprecated by google :( 但Google基本上不赞成这一点:(

So fundamentally: you need the schema. 所以从根本上说:您需要架构。 The encoded data does not include what you want. 编码的数据不包括您想要的。 It does this to avoid unnecessary space - if the protocol assumes that the encoder and decoder both know what the message is meant to look like, then a lot less information needs to be sent. 这样做是为了避免不必要的空间-如果协议假设编码器和解码器都知道消息的含义,则需要发送的信息要少得多。

Note, however, that the information that is included is enough to safely round-trip a message even if there are fields that are not expected; 但是请注意,该包含的信息是足够安全地往返,即使有未预期的字段的消息; it is not necessary to know the name or type if you only need to re-encode it to pass it along / back. 如果您只需要重新编码以沿/返回传递,则无需知道名称或类型。

What you can do is use the parser API to scan over the data to reveal that there are three fields, field 1 is a varint, field 2 is length-prefixed, field 3 is length-prefixed. 可以做的是使用解析器API扫描数据以显示存在三个字段,字段1为varint,字段2为长度前缀,字段3为长度前缀。 You could make educated guesses about the data beyond that (for example, you could see whether a UTF-8 decode produces something that looks roughly text-like, and verify that UTF-8 encoding that gives you back the original bytes; if it does, it is possible it is a string) 您可以对超出此范围的数据进行有根据的猜测 (例如,您可以查看UTF-8解码是否产生了看起来像文本的东西,并验证UTF-8编码是否可以返回原始字节;如果可以) , 可能是字符串)

Can I Serialize C++ object and send it on the wire to Java server and Deserialize there in java and introspect the fields. 我可以序列化C ++对象并将其在线发送到Java服务器,然后在Java中反序列化到Java服务器并进行内部检查。

Yes, it is the very goal of protobuf. 是的,这是protobuf的真正目标。

Serialize data in an application developed in any supported language, and deserialize data in an application developed in any supported language. 在以任何受支持的语言开发的应用程序中序列化数据,并在以任何受支持的语言开发的应用程序中反序列化数据。 Serialization and deserialization languages can be the same, or be different. 序列化和反序列化语言可以相同或不同。

Keep in mind that protocol buffers are not self describing, so both sides of your application needs to have serializers/deserializers generated from the .proto file. 请记住,协议缓冲区不是自描述的,因此应用程序的两面都需要具有从.proto文件生成的序列化器/反序列化器。

In short: yes you can. 简而言之:是的,您可以。

You will need to create .proto files which define the data structures that you want to share. 您将需要创建.proto文件,以定义要共享的数据结构。 By using the Google Protocol Buffers compiler you can then generate interfaces and (de)serialization code for your structures for both Java and C++ (and almost any other language you can think of). 通过使用Google Protocol Buffers编译器,您可以为JavaC++ (以及您能想到的几乎任何其他语言)的结构生成接口和(反序列化)代码。

To transfer your data over the wire you can use for instance ZeroMQ which is an extremely versatile communications framework which also sports a slew of different language API's, among them Java and C++ . 为了通过网络传输数据,您可以使用ZeroMQ ,它是一种功能极为广泛的通信框架,它还具有许多不同语言的API,其中包括JavaC++

See this question for more details . 有关更多详细信息,请参见此问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM