Why does protocol buffer performance worse than JSON in my case?

Question

I'm doing a test between protocol buffers and json, but i find the performance of protocol buffers out of my expectation. I wonder if my http request is too complicated or protocol buffers just is not appropriate in my case. Here's my http request:

{ 
"LogId": "ABC165416515166165484165164132",
"ci_came": "uvInYKaMbhcyBm4p",
"ci_cme": "GWPMzgSwKzEZ5Mmz",
"ci_me": "gqHVRaSDTeksgijHM18QzajcVh21bBAq0TkknIXBQnkGKAPQbsZG35cDG6usAxwoxxiey9AnKbsN",
"ci_pi": "U6dpu0828Q0NNP5JRgVvCMgnn41poxfPlzhwU6FdBOrFsujzskD0HKTcQAXBcXgaOuYcckGtZDs9roJsK",
"th_logic": "IUPPORTxr_17_iIT_yiIS=[LthyOy.gitg.warmh;@6icy855 vxrsionCorx=1 uiyRr=msm8953 uiOTLOyrxR=unknown uPx=ujij Ir=N2G47H TIMx=1541178617000 iRyNr=gitrI TyG=iuilr HyRrWyRx=qyum jijIyL=1743iC101955 SUPPORTxr_yiIS=[Lhug.hxng.warmh;@5ic26y CPU_yiI=yrm17-v8y IS_rxiUGGyiLx=fylsx RyrIO=unknown MyNUFyCTURxR=gitrI IS_xMULyTOR=fylsx SUPPORTxr_32_iIT_yiIS=[Lhug.gitg.warmh;@xr9170c TyGS=txst-kxys CPU_yiI2= UNKNOWN=unknown PxRMISSIONS_RxVIxW_RxQUIRxr=fylsx Ujij=iuilrxr FINGxRPRINT=firxtruck/msm8953_17/msm8953_17:7.1.2/N2G47H/iuilrx11081144:ujij/txst-kxys HltT=gitriXM-iuilrxr-03 vxrsionNymx=1.0 PROrUCT=yxCR C10 rISPLyY=N2G47H txst-kxys MOrxL=yxCR C10 rxVICx=yxCR C10 hug.gitg.yrithmxticxxcxption: rivirx iy hxro yt yum.upspacepyy.fycxpyy.yummxnt.vixw.SxttingItxmFrygmxnt.onClick(SxttingItxmFrygmxnt.hug:117) yt ynrroir.vixw.Vixw.pxrformClick(Vixw.hug:5637) yt ynrroir.vixw.Vixw$PxrformClick.run(Vixw.hug:22433) yt ynrroir.lt.Hynrlxr.hynrlxCylliyck(Hynrlxr.hug:751) yt ynrroir.lt.Hynrlxr.rispytchMxssygx(Hynrlxr.hug:95)rimrgrilsTiiUxpXiHxXoOxX8kRituil",
"th_ideal": "TqpXdC5NQF",
"th_sth": "YTVMYUSuprQzHaQLgRdvxp0g8nLWdEZBc0UfrcyrQv09CKPBuacEesMfoiXqXHP2G2Duvmnzmv20iBBQKCuAk1piKvS9MvR9ymxD5YYahyBsdoWetqKjAuTBS115rqwDGhe2qDWMcRnZF3QF9f4WF5sJsFlmxroZzprR",
"th_err": "StGMzqIW1YGg44GC",
"th_code": "zrhwEaVmVlNPTUZCCO0j62bFjL6Sjnb8JxNng645fQOMlxA5ceKOwH67aYkK0FnM3vKMpXAbdLwCWAyVUjuvcH1",
"th_req": "ZWZXPUr6O4jYrXjLXlXskem7jHQ6D",
"th_index": "6546546546",
"th_log": "3Gt8V7LMxUMlvHPzcVUYCQl8zvwaDfDEzWn7GxOHbzf9quoZFTl2WwFRpMox2V8zfjbOQiIg4dxjf0x1vWGKHhvnmabXCO5jDWVE33TgI0YTJO14uYEnezdzYDoeR51",
"th_order": "T28XGCx1O3LCGa98lAtWc33",
"message": "Crash",
"time": "2019-11-11 18:23:00",
"ci_vi": "RCgdDu5874sJohjEVy7i72Kcp98rCOJvl",
"t_mNo": "1.9",
"t_tNo": "Gxk9Vb3zblp2PHpYTQzTXmzx43WaEtZmA3CWFfXtPsDZFgaAIug5mbX73w4wQvwNL65BEOW3fd7wExndzm3eilp4jODtHZQaV5G574FPfK",
"t_fd": "k58xs1eYKTvDxbRMWfPJMdB6tfBnGaOLAnmDUZxo2URebvtd8F",
"t_pd": "jWl7CTWdmgVFZxA",
"t_oer": "HHoLyXNYxKHqZgpev9vi",
"t_ar": "J6m4X9ATlADGaKUzi1eb",
"t_sr": "daP",
"t_sd": "AgXPBAaOrA95b9PM4196BQaLsVN9j9",
"t_sn": "1Ai4lFVObo0MymeJ894m0jItjiwhcD",
"t_dd": "zLuh1p1G",
"timeS": "2019-11-11 18:22:58" 
}

Here's my proto file:

message Scada {
 TechInfo tInfo = 1;
 string time = 2;
 string message = 3;
 Thought thought = 4;
} 
message TechInfo {
 string mNo = 1;
 string tNo = 2;
 string fd = 3;
 string pd = 4;
 string oer = 5;
 string ar = 6;
 Ci cd = 7;
 string sr = 8;
 string sd = 9;
 string sn = 10;
 string dd = 11; 
} 
message Ci{
 string pi = 1;
 string vi = 2;
 string me = 3;
 string cme = 4;
 string came = 5; 
} 
message Thought{
 string logic = 1;
 string ideal = 2;
 string sth = 3;
 string err = 4;
 string code = 5;
 string req = 6;
 string index = 7;
 string log = 8;
 string order = 9; 
}

And i use protocol buffers parseFrom() method to deserialize the request:

public static Scada pbDeSerialize(byte[] pbBytes) throws InvalidProtocolBufferException {
    Scada scada = ScadaObj.Scada.parseFrom(pbBytes);
    return scada; 
}

I use json tools to deserialize the request:

public static PbScadaJsonObj jsonDeserialize(byte[] jsonBytes) {
    String str = new String(jsonBytes, utf8Charset);
    return JsonUtil.deserialize(str, PbScadaJsonObj.class); 
} 
public static <T> T deserialize(String json, Class<T> clazz) {
    return JSON.parseObject(json, clazz); 
}

And i use jmeter to test these two methods. The test consists of one thread and 100 threads. One request message is about 3KB. Json ProtoBuf(PB) deserialize message are tested in 1024MB heap size. Before each execution, i always add a random number to make the message different from each other. My machine is 2C4G.

+---------------------+----------+------+
| 100k loops 1 thread | FastJson | PB   |
+---------------------+----------+------+
| TIME(s)             | 360      | 309  |
+---------------------+----------+------+
| CPU(%)              | 104      | 99.8 |
+---------------------+----------+------+
| MEM(%)              | 7.2      | 6.6  |
+---------------------+----------+------+

+------------+----------+-------+
| 100threads | FastJson | PB    |
+------------+----------+-------+
| TPS(/s)    | 274.3    | 321.9 |
+------------+----------+-------+
| CPU(%)     | 185.8    | 168.6 |
+------------+----------+-------+
| MEM(%)     | 9.1      | 28.6  |
+------------+----------+-------+

From the test, i can't tell the protocol buffers improvement which consume much more memory with just 50/s TPS increasing.Could anyone explain this for me? Or anyone who did some kinda stuff like this test?

Answer 1

The comparison is unfair. Your Protobuf definition is nested, while your JSON definition is flat. If you want to do a fair comparison, make your JSON nested or make the Protobuf flat:

Make JSON nested:

{
 "tInfo" : {"mNo" : "xxxx", "tNo" : "xxxx", "other" : "fields"},
 "time" : "2019-11-11 18:23:00",
 "message" : "Crash",
 "thought" : {"logic" : "xxx", "other" : "fields"}
}

Answer 2

Every serialization mechanism has strengths and weaknesses. From memory I'd say it looks roughly like this:

Serialization: cheap (+) / expensive (-)	Protobuf	JSON
primitives (numbers/bool/enum)	`+`	`-`
raw bytes	`+`	`-`
skipping content	`+`	`-`
nested messages	`-`	`+`
strings	`-`	`+`

Protobuf encodes Strings and nested messages as length delimited data, so each string is prepended by the length of the string in bytes. This was a deliberate choice and has benefits when parsing (eg lazy parsing strings and efficient skipping), but it does add a cost to serialization. Implementations may need to precompute the length and effectively convert the string to bytes twice. JSON uses a start and end character, so it can directly stream into an output buffer. The difference gets smaller with caching, but Protobuf always has to do more work to encode a String than JSON.

Given that your protos only contain strings and nested messages, I wouldn't expect a lot of performance gains purely from switching to Protobuf. It may gain some speed on the field identifiers, but your field names were already shortened to a point where they are barely human readable.

On the other hand, several of your strings look like numbers and base64 encoded data. Switching those to Protobuf's primitive and bytes types would be a lot more efficient and should provide a significant good speedup.

Why does protocol buffer performance worse than JSON in my case?

Question

2 answers

solution1
0 2019-12-27 01:23:59

solution2
0 2023-01-08 23:42:20

Why does protocol buffer performance worse than JSON in my case?

Question

2 answers

solution1 0 2019-12-27 01:23:59

solution2 0 2023-01-08 23:42:20

solution1
0 2019-12-27 01:23:59

solution2
0 2023-01-08 23:42:20