为什么在我的例子中协议缓冲区性能比 JSON 差？

Question

I'm doing a test between protocol buffers and json, but i find the performance of protocol buffers out of my expectation.我正在 protocol buffers 和 json 之间进行测试，但我发现 protocol buffers 的性能超出了我的预期。 I wonder if my http request is too complicated or protocol buffers just is not appropriate in my case.我想知道我的 http 请求是否太复杂或协议缓冲区不适合我的情况。 Here's my http request:这是我的 http 请求：

{ 
"LogId": "ABC165416515166165484165164132",
"ci_came": "uvInYKaMbhcyBm4p",
"ci_cme": "GWPMzgSwKzEZ5Mmz",
"ci_me": "gqHVRaSDTeksgijHM18QzajcVh21bBAq0TkknIXBQnkGKAPQbsZG35cDG6usAxwoxxiey9AnKbsN",
"ci_pi": "U6dpu0828Q0NNP5JRgVvCMgnn41poxfPlzhwU6FdBOrFsujzskD0HKTcQAXBcXgaOuYcckGtZDs9roJsK",
"th_logic": "IUPPORTxr_17_iIT_yiIS=[LthyOy.gitg.warmh;@6icy855 vxrsionCorx=1 uiyRr=msm8953 uiOTLOyrxR=unknown uPx=ujij Ir=N2G47H TIMx=1541178617000 iRyNr=gitrI TyG=iuilr HyRrWyRx=qyum jijIyL=1743iC101955 SUPPORTxr_yiIS=[Lhug.hxng.warmh;@5ic26y CPU_yiI=yrm17-v8y IS_rxiUGGyiLx=fylsx RyrIO=unknown MyNUFyCTURxR=gitrI IS_xMULyTOR=fylsx SUPPORTxr_32_iIT_yiIS=[Lhug.gitg.warmh;@xr9170c TyGS=txst-kxys CPU_yiI2= UNKNOWN=unknown PxRMISSIONS_RxVIxW_RxQUIRxr=fylsx Ujij=iuilrxr FINGxRPRINT=firxtruck/msm8953_17/msm8953_17:7.1.2/N2G47H/iuilrx11081144:ujij/txst-kxys HltT=gitriXM-iuilrxr-03 vxrsionNymx=1.0 PROrUCT=yxCR C10 rISPLyY=N2G47H txst-kxys MOrxL=yxCR C10 rxVICx=yxCR C10 hug.gitg.yrithmxticxxcxption: rivirx iy hxro yt yum.upspacepyy.fycxpyy.yummxnt.vixw.SxttingItxmFrygmxnt.onClick(SxttingItxmFrygmxnt.hug:117) yt ynrroir.vixw.Vixw.pxrformClick(Vixw.hug:5637) yt ynrroir.vixw.Vixw$PxrformClick.run(Vixw.hug:22433) yt ynrroir.lt.Hynrlxr.hynrlxCylliyck(Hynrlxr.hug:751) yt ynrroir.lt.Hynrlxr.rispytchMxssygx(Hynrlxr.hug:95)rimrgrilsTiiUxpXiHxXoOxX8kRituil",
"th_ideal": "TqpXdC5NQF",
"th_sth": "YTVMYUSuprQzHaQLgRdvxp0g8nLWdEZBc0UfrcyrQv09CKPBuacEesMfoiXqXHP2G2Duvmnzmv20iBBQKCuAk1piKvS9MvR9ymxD5YYahyBsdoWetqKjAuTBS115rqwDGhe2qDWMcRnZF3QF9f4WF5sJsFlmxroZzprR",
"th_err": "StGMzqIW1YGg44GC",
"th_code": "zrhwEaVmVlNPTUZCCO0j62bFjL6Sjnb8JxNng645fQOMlxA5ceKOwH67aYkK0FnM3vKMpXAbdLwCWAyVUjuvcH1",
"th_req": "ZWZXPUr6O4jYrXjLXlXskem7jHQ6D",
"th_index": "6546546546",
"th_log": "3Gt8V7LMxUMlvHPzcVUYCQl8zvwaDfDEzWn7GxOHbzf9quoZFTl2WwFRpMox2V8zfjbOQiIg4dxjf0x1vWGKHhvnmabXCO5jDWVE33TgI0YTJO14uYEnezdzYDoeR51",
"th_order": "T28XGCx1O3LCGa98lAtWc33",
"message": "Crash",
"time": "2019-11-11 18:23:00",
"ci_vi": "RCgdDu5874sJohjEVy7i72Kcp98rCOJvl",
"t_mNo": "1.9",
"t_tNo": "Gxk9Vb3zblp2PHpYTQzTXmzx43WaEtZmA3CWFfXtPsDZFgaAIug5mbX73w4wQvwNL65BEOW3fd7wExndzm3eilp4jODtHZQaV5G574FPfK",
"t_fd": "k58xs1eYKTvDxbRMWfPJMdB6tfBnGaOLAnmDUZxo2URebvtd8F",
"t_pd": "jWl7CTWdmgVFZxA",
"t_oer": "HHoLyXNYxKHqZgpev9vi",
"t_ar": "J6m4X9ATlADGaKUzi1eb",
"t_sr": "daP",
"t_sd": "AgXPBAaOrA95b9PM4196BQaLsVN9j9",
"t_sn": "1Ai4lFVObo0MymeJ894m0jItjiwhcD",
"t_dd": "zLuh1p1G",
"timeS": "2019-11-11 18:22:58" 
}

Here's my proto file:这是我的原型文件：

message Scada {
 TechInfo tInfo = 1;
 string time = 2;
 string message = 3;
 Thought thought = 4;
} 
message TechInfo {
 string mNo = 1;
 string tNo = 2;
 string fd = 3;
 string pd = 4;
 string oer = 5;
 string ar = 6;
 Ci cd = 7;
 string sr = 8;
 string sd = 9;
 string sn = 10;
 string dd = 11; 
} 
message Ci{
 string pi = 1;
 string vi = 2;
 string me = 3;
 string cme = 4;
 string came = 5; 
} 
message Thought{
 string logic = 1;
 string ideal = 2;
 string sth = 3;
 string err = 4;
 string code = 5;
 string req = 6;
 string index = 7;
 string log = 8;
 string order = 9; 
}

And i use protocol buffers parseFrom() method to deserialize the request:我使用协议缓冲区 parseFrom() 方法反序列化请求：

public static Scada pbDeSerialize(byte[] pbBytes) throws InvalidProtocolBufferException {
    Scada scada = ScadaObj.Scada.parseFrom(pbBytes);
    return scada; 
}

I use json tools to deserialize the request:我使用 json 工具反序列化请求：

public static PbScadaJsonObj jsonDeserialize(byte[] jsonBytes) {
    String str = new String(jsonBytes, utf8Charset);
    return JsonUtil.deserialize(str, PbScadaJsonObj.class); 
} 
public static <T> T deserialize(String json, Class<T> clazz) {
    return JSON.parseObject(json, clazz); 
}

And i use jmeter to test these two methods.我用jmeter来测试这两种方法。 The test consists of one thread and 100 threads.测试由一个线程和100个线程组成。 One request message is about 3KB.一条请求消息大约3KB。 Json ProtoBuf(PB) deserialize message are tested in 1024MB heap size. Json ProtoBuf(PB)反序列化消息在 1024MB 堆大小中进行了测试。 Before each execution, i always add a random number to make the message different from each other.在每次执行之前，我总是添加一个随机数以使消息彼此不同。 My machine is 2C4G.我的机器是2C4G。

+---------------------+----------+------+
| 100k loops 1 thread | FastJson | PB   |
+---------------------+----------+------+
| TIME(s)             | 360      | 309  |
+---------------------+----------+------+
| CPU(%)              | 104      | 99.8 |
+---------------------+----------+------+
| MEM(%)              | 7.2      | 6.6  |
+---------------------+----------+------+

+------------+----------+-------+
| 100threads | FastJson | PB    |
+------------+----------+-------+
| TPS(/s)    | 274.3    | 321.9 |
+------------+----------+-------+
| CPU(%)     | 185.8    | 168.6 |
+------------+----------+-------+
| MEM(%)     | 9.1      | 28.6  |
+------------+----------+-------+

From the test, i can't tell the protocol buffers improvement which consume much more memory with just 50/s TPS increasing.Could anyone explain this for me?从测试中，我无法判断协议缓冲区的改进，它消耗了更多的 memory，而 TPS 仅增加了 50/s。有人可以为我解释一下吗？ Or anyone who did some kinda stuff like this test?或者有人做过像这个测试这样的东西吗？

Answer 1

The comparison is unfair.这种比较是不公平的。 Your Protobuf definition is nested, while your JSON definition is flat.您的 Protobuf 定义是嵌套的，而您的 JSON 定义是扁平的。 If you want to do a fair comparison, make your JSON nested or make the Protobuf flat:如果你想做一个公平的比较，让你的 JSON 嵌套或让 Protobuf 扁平化：

Make JSON nested:使 JSON 嵌套：

{
 "tInfo" : {"mNo" : "xxxx", "tNo" : "xxxx", "other" : "fields"},
 "time" : "2019-11-11 18:23:00",
 "message" : "Crash",
 "thought" : {"logic" : "xxx", "other" : "fields"}
}

Answer 2

Every serialization mechanism has strengths and weaknesses.每种序列化机制都有优点和缺点。 From memory I'd say it looks roughly like this:从 memory 我会说它看起来大致是这样的：

Serialization: cheap (+) / expensive (-)序列化：便宜 (+) / 昂贵 (-)	Protobuf协议缓冲区	JSON JSON
primitives (numbers/bool/enum)原语（数字/布尔/枚举）	`+`	`-`
raw bytes原始字节	`+`	`-`
skipping content跳过内容	`+`	`-`
nested messages嵌套消息	`-`	`+`
strings字符串	`-`	`+`

Protobuf encodes Strings and nested messages as length delimited data, so each string is prepended by the length of the string in bytes. Protobuf 将字符串和嵌套消息编码为长度分隔数据，因此每个字符串都以字符串长度（以字节为单位）作为前缀。 This was a deliberate choice and has benefits when parsing (eg lazy parsing strings and efficient skipping), but it does add a cost to serialization.这是一个深思熟虑的选择，在解析时有好处（例如，惰性解析字符串和高效跳过），但它确实增加了序列化的成本。 Implementations may need to precompute the length and effectively convert the string to bytes twice.实现可能需要预先计算长度并有效地将字符串转换为字节两次。 JSON uses a start and end character, so it can directly stream into an output buffer. JSON使用了起始字符和结束字符，所以它可以直接将stream放入一个output缓冲区中。 The difference gets smaller with caching, but Protobuf always has to do more work to encode a String than JSON.使用缓存时差异会变小，但 Protobuf 总是必须比 JSON 做更多的工作来编码字符串。

Given that your protos only contain strings and nested messages, I wouldn't expect a lot of performance gains purely from switching to Protobuf.鉴于您的原型仅包含字符串和嵌套消息，我不希望纯粹通过切换到 Protobuf 获得很多性能提升。 It may gain some speed on the field identifiers, but your field names were already shortened to a point where they are barely human readable.它可能会在字段标识符上获得一些速度，但是您的字段名称已经缩短到人类几乎无法阅读的程度。

On the other hand, several of your strings look like numbers and base64 encoded data.另一方面，您的几个字符串看起来像数字和 base64 编码数据。 Switching those to Protobuf's primitive and bytes types would be a lot more efficient and should provide a significant good speedup.将它们切换为 Protobuf 的primitive类型和bytes类型会更有效率，并且应该提供显着的良好加速。

为什么在我的例子中协议缓冲区性能比 JSON 差？

问题描述

2 个解决方案

解决方案1
0 2019-12-27 01:23:59

解决方案2
0 2023-01-08 23:42:20

为什么在我的例子中协议缓冲区性能比 JSON 差？

问题描述

2 个解决方案

解决方案1 0 2019-12-27 01:23:59

解决方案2 0 2023-01-08 23:42:20

解决方案1
0 2019-12-27 01:23:59

解决方案2
0 2023-01-08 23:42:20