简体   繁体   English

Erlang中的二进制协议解析

[英]Binary Protocol Parsing in Erlang

I am a bit struggling with extracting fields from a binary message. 我有点努力从二进制消息中提取字段。 Raw message looks like the following: 原始消息如下所示:

<<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>

I know the order, type and static sizes of fields, some have arbitary sizes thought, so I am trying to do something like the following: 我知道字段的顺序,类型和静态大小,有些字段考虑了任意大小,因此我正在尝试执行以下操作:

newobj(Data) ->
  io:fwrite("NewObj RAW ~p~n",[Data]),
  NewObj = {obj,rest(uint16(string(uint16({[],Data},id),type),parent),unparsed)},
  io:fwrite("NewObj ~p~n",[NewObj]),
  NewObj.

uint16/2 , string/2 , and rest/2 are actually extraction functions and look like this: uint16 / 2string / 2rest / 2实际上是提取函数,如下所示:

uint16(ListData, Name) ->
  {List, Data} = ListData,
  case Data of
    <<Int:2/little-unsigned-unit:8, Rest/binary>> ->
      {List ++ [{Name,Int}], Rest};
    <<Int:2/little-unsigned-unit:8>> ->
      List ++ [{Name,Int}]
  end.
string(ListData, Name) ->
  {List, Data} = ListData,
  Split = binary:split(Data,<<0>>),
  String = lists:nth(1, Split),
  if
    length(Split) == 2 ->
      {List ++ [{Name, String}], lists:nth(2, Split)};
    true ->
      List ++ [{Name, String}]
  end.
rest(ListData, Name) ->
  {List, Data} = ListData,
  List ++ [{Name, Data}].

This works and looks like: 这有效,看起来像:

NewObj RAW <<1,0,97,98,99,100,0,0,0,3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>
NewObj {obj,[{id,1},
             {type,<<"abcd">>},
             {parent,0},
             {unparsed,<<3,0,0,0,0,0,0,0,0,0,3,32,3,0,0,88,2,0,0>>}]}

The reason for this question though is that passing {List, Data} as ListData and then splitting it within the function with {List, Data} = ListData feels clumsy - so is there a better way? 但是,出现此问题的原因是将{List,Data}作为ListData传递,然后使用{List,Data} = ListData在函数中拆分它感觉很笨拙-因此,有更好的方法吗? I think I can't use static matching because "unparsed" and "type" parts are of arbitary length, so it's not possible to define their respective sizes. 我认为我不能使用静态匹配,因为“未解析”和“类型”部分的长度是任意的,因此无法定义它们各自的大小。

Thanks! 谢谢!

---------------Update----------------- ---------------更新-----------------

Trying to take comments below into account - code now looks like the following: 尝试考虑以下注释-现在的代码如下所示:

newobj(Data) ->
  io:fwrite("NewObj RAW ~p~n",[Data]),
  NewObj = {obj,field(
                field(
                field({[], Data},id,fun uint16/1),
                type, fun string/1),
                unparsed,fun rest/1)},
  io:fwrite("NewObj ~p~n",[NewObj]).

field({List, Data}, Name, Func) ->
  {Value,Size} = Func(Data),
  case Data of
    <<_:Size/binary-unit:8>> ->
      [{Name,Value}|List];
    <<_:Size/binary-unit:8, Rest/binary>> ->
      {[{Name,Value}|List], Rest}
  end.

uint16(Data) ->
  case Data of
    <<UInt16:2/little-unsigned-unit:8, _/binary>> ->
      {UInt16,2};
    <<UInt16:2/little-unsigned-unit:8>> ->
      {UInt16,2}
  end.

string(Data) ->
  Split = binary:split(Data,<<0>>),
  case Split of
    [String, Rest] ->
      {String,byte_size(String)+1};
    [String] ->
      {String,byte_size(String)+1}
  end.

rest(Data) ->
  {Data,byte_size(Data)}.

The code is non idiomatic and some pieces cannot compile as is :-) Here are some comments: 该代码不是惯用语言,有些代码无法按原样编译:-)以下是一些注释:

  • The newobj/1 function makes a reference to a NewObj variable that is unbound. newobj/1函数引用未绑定的NewObj变量。 Probably the real code is something like NewObj = {obj,rest(... ? 实际的代码可能类似于NewObj = {obj,rest(...

  • The code uses list append ( ++ ) multiple times. 该代码多次使用列表追加( ++ )。 This should be avoided if possible because it performs too much memory copies. 如果可能,应该避免这种情况,因为它执行过多的内存副本。 The idiomatic way is to add to the head of the list as many times as needed (that is: L2 = [NewThing | L1] ) and call lists:reverse/1 at the very end. 惯用的方法是根据需要多次添加到列表的开头(即: L2 = [NewThing | L1] ),并在最后调用lists:reverse/1 See any Erlang book or the free Learn Yourself some Erlang for the details. 有关详细信息,请参见任何Erlang书籍或免费的《 Erarn Your some Erlang》。

  • In a similar vein, lists:nth/2 should be avoided and replaced by pattern matching or a different way to construct the list or parse the binary 同样,应该避免使用lists:nth/2应使用模式匹配或其他方法来替换lists:nth/2来构造列表或解析二进制文件

  • Dogbert's suggestion about doing the pattern matching directly in the function argument is a good idiomatic approach and allows to remove some lines from the code. Dogbert关于直接在function参数中进行模式匹配的建议是一种很好的惯用方法,它允许从代码中删除一些行。

As last suggestion regarding the approach to debug, consider replacing the fwrite functions with proper unit tests. 作为有关调试方法的最后建议,请考虑使用适当的单元测试替换fwrite函数。

Hope this gives some hints for what to look at. 希望这会给您一些提示。 Feel free to append to your question the code changes, we can proceed from there. 请随时在您的问题后附加代码更改,我们可以从那里继续进行。

EDIT 编辑

It's looking better. 看起来好多了。 Let's see if we can simplify. 让我们看看是否可以简化。 Please note that we are doing the work backwards, because we are adding tests after the production code has been written, instead of doing test-driven development. 请注意,我们正在做反向工作,因为我们是在编写生产代码之后添加测试,而不是进行测试驱动的开发。

Step 1: add test. 步骤1:添加测试。

I also reversed the order of the list because it looks more natural. 我也颠倒了列表的顺序,因为它看起来更自然。

-include_lib("eunit/include/eunit.hrl").

happy_input_test() ->
    Rest = <<3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 32, 3, 0, 0, 88, 2, 0, 0>>,
    Input = <<1, 0,
              97, 98, 99, 100, 0,
              0, 0,
              Rest/binary>>,
    Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}]},
    ?assertEqual(Expected, binparse:newobj(Input)).

We can run this, among other ways, with rebar3 eunit (see the rebar3 documentation; I suggest to start with rebar3 new lib mylib to create a skeleton). 我们可以使用rebar3 eunit来运行此rebar3 eunit (请参见rebar3文档;我建议从rebar3 new lib mylib开始创建一个框架)。

Step 2: the absolute minimum 步骤2:绝对最小值

Your description is not enough to understand which fields are mandatory and which are optional and whether there is always something more after the obj . 您的描述不足以了解哪些字段是必填字段,哪些是可选字段,以及obj之后是否总还有其他内容。

In the simplest possible case, all your code can be reduced to: 在最简单的情况下, 所有代码都可以简化为:

newobj(Bin) ->
    <<Id:16/little-unsigned, Rest/binary>> = Bin,
    [Type, Rest2] = binary:split(Rest, <<0>>),
    <<Parent:16/little-unsigned, Rest3/binary>> = Rest2,
    {obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]}.

Quite compact :-) 相当紧凑:-)

I find the encoding of the string very bizarre: a binary encoding where the string is NUL-terminated (so forces to walk the binary) instead of being encoded with, say, 2 or 4 bytes to represent the length and then the string itself. 我发现字符串的编码非常奇怪:二进制编码,其中字符串是NUL终止的(因此强制沿二进制移动),而不是用2或4个字节编码来表示长度,然后是字符串本身。

Step 3: input validation 步骤3:输入验证

Since we are parsing a binary, this is probably coming from the outside of our system. 由于我们正在解析二进制文件,因此这可能来自系统外部。 As such, the let it crash philosophy doesn't apply and we have to perform full input validation. 因此,“让它崩溃”的原理不适用,我们必须执行完整的输入验证。

I make the assumption that all fields are mandatory except unparsed , that can be empty. 我假设所有字段都是必需字段,除了unparsed之外,可以为空。

missing_unparsed_is_ok_test() ->
    Input = <<1, 0,
              97, 98, 99, 100, 0,
              0, 0>>,
    Expected = {obj, [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, <<>>}]},
    ?assertEqual(Expected, binparse:newobj(Input)).

The simple implementation above passes it. 上面的简单实现通过了它。

Step 4: malformed parent 步骤4:格式错误的父母

We add the tests and we make a API decision: the function will return an error tuple. 我们添加测试并做出API决定:该函数将返回一个错误元组。

missing_parent_is_error_test() ->
    Input = <<1, 0,
              97, 98, 99, 100, 0>>,
    ?assertEqual({error, bad_parent}, binparse:newobj(Input)).

malformed_parent_is_error_test() ->
    Input = <<1, 0,
              97, 98, 99, 100, 0,
              0>>,
    ?assertEqual({error, bad_parent}, binparse:newobj(Input)).

We change the implementation to pass the tests: 我们更改实现以通过测试:

newobj(Bin) ->
    <<Id:16/little-unsigned, Rest/binary>> = Bin,
    [Type, Rest2] = binary:split(Rest, <<0>>),
    case Rest2 of
        <<Parent:16/little-unsigned, Rest3/binary>> ->
            {obj, [{id, Id}, {type, Type}, {parent, Parent}, {unparsed, Rest3}]};
        Rest2 ->
            {error, bad_parent}
    end.

Step 5: malformed type 步骤5:格式错误

The new tests: 新测试:

missing_type_is_error_test() ->
    Input = <<1, 0>>,
    ?assertEqual({error, bad_type}, binparse:newobj(Input)).

malformed_type_is_error_test() ->
    Input = <<1, 0,
              97, 98, 99, 100>>,
    ?assertEqual({error, bad_type}, binparse:newobj(Input)).

We could be tempted to change the implementation as follows: 我们可能会尝试如下更改实现:

newobj(Bin) ->
    <<Id:16/little-unsigned, Rest/binary>> = Bin,
    case binary:split(Rest, <<0>>) of
        [Type, Rest2] ->
            case Rest2 of
                <<Parent:16/little-unsigned, Rest3/binary>> ->
                    {obj, [
                        {id, Id}, {type, Type},
                        {parent, Parent}, {unparsed, Rest3}
                    ]};
                Rest2 ->
                    {error, bad_parent}
            end;
        [Rest] -> {error, bad_type}
    end.

Which is an unreadable mess. 这是一个难以理解的混乱。 Just adding functions doesn't help us: 仅添加功能并不能帮助我们:

newobj(Bin) ->
    <<Id:16/little-unsigned, Rest/binary>> = Bin,
    case parse_type(Rest) of
        {ok, {Type, Rest2}} ->
            case parse_parent(Rest2) of
                {ok, Parent, Rest3} ->
                    {obj, [
                        {id, Id}, {type, Type},
                        {parent, Parent}, {unparsed, Rest3}
                    ]};
                {error, Reason} -> {error, Reason}
            end;
        {error, Reason} -> {error, Reason}
    end.

parse_type(Bin) ->
    case binary:split(Bin, <<0>>) of
        [Type, Rest] -> {ok, {Type, Rest}};
        [Bin] -> {error, bad_type}
    end.

parse_parent(Bin) ->
    case Bin of
        <<Parent:16/little-unsigned, Rest/binary>> -> {ok, Parent, Rest};
        Bin -> {error, bad_parent}
    end.

This is a classic problem in Erlang with nested conditionals. 这是Erlang中带有嵌套条件的经典问题。

Step 6: regaining sanity 第6步:恢复理智

Here is my approach, quite generic so applicable (I think) to many domains. 这是我的方法,非常通用,因此适用于(许多领域)。 The overall idea is taken from backtracking, as explained in http://rvirding.blogspot.com/2009/03/backtracking-in-erlang-part-1-control.html 总体思路来自回溯,如http://rvirding.blogspot.com/2009/03/backtracking-in-erlang-part-1-control.html中所述

We create one function per parse step and pass them, as a list, to call_while_ok/3 : 我们在每个解析步骤中创建一个函数,并将它们作为列表传递给call_while_ok/3

newobj(Bin) ->
    Parsers = [fun parse_id/1,
               fun parse_type/1,
               fun parse_parent/1,
               fun(X) -> {ok, {unparsed, X}, <<>>} end
              ],
    case call_while_ok(Parsers, Bin, []) of
        {error, Reason} -> {error, Reason};
        PropList -> {obj, PropList}
    end.

Function call_while_ok/3 is somehow related to lists:foldl and lists:filter : 函数call_while_ok/3以某种方式与lists:foldllists:filter

call_while_ok([F], Seed, Acc) ->
    case F(Seed) of
        {ok, Value, _NextSeed} -> lists:reverse([Value | Acc]);
        {error, Reason} -> {error, Reason}
    end;
call_while_ok([F | Fs], Seed, Acc) ->
    case F(Seed) of
        {ok, Value, NextSeed} -> call_while_ok(Fs, NextSeed, [Value | Acc]);
        {error, Reason} -> {error, Reason}
    end.

And here are the parsing functions. 这里是解析功能。 Note that their signature is always the same: 请注意,它们的签名始终相同:

parse_id(Bin) ->
    <<Id:16/little-unsigned, Rest/binary>> = Bin,
    {ok, {id, Id}, Rest}.

parse_type(Bin) ->
    case binary:split(Bin, <<0>>) of
        [Type, Rest] -> {ok, {type, Type}, Rest};
        [Bin] -> {error, bad_type}
    end.

parse_parent(Bin) ->
    case Bin of
        <<Parent:16/little-unsigned, Rest/binary>> ->
            {ok, {parent, Parent}, Rest};
        Bin -> {error, bad_parent}
    end.

Step 7: homework 步骤7:作业

The list [{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}] is a proplist (see Erlang documentation), which predates Erlang maps. 列表[{id, 1}, {type, <<"abcd">>}, {parent, 0}, {unparsed, Rest}]是一个proplist (请参阅Erlang文档),它早于Erlang映射。

Have a look at the documentation for maps and see if it makes sense to return a map instead. 查看地图文档,看看是否有必要返回地图。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM