简体   繁体   English

如何优化解析数据流算法?

[英]How to optimize parse data flow algorithm?

I need to implement some abstract protocol client-server conversation parsing library with C++. 我需要使用C ++实现一些抽象协议的客户端-服务器会话解析库。 I don't have file containing the whole client-server conversation, but have to parse it on the fly. 我没有包含整个客户端-服务器对话的文件,但是必须动态解析它。 I have to implement following interface: 我必须实现以下接口:

class parsing_class
{
  public:
  void on_data( const char* data, size_t len );
  //other functions
  private:
  size_t pos_;// current position in the data flow
  bool first_part_parsed_;
  bool second_part_parsed_;
  //... some more bool markers or something like vector< bool >
};

The data is passed to my class through on_data function. 数据通过on_data函数传递给我的班级。 Data chunk length varies from one call to another. 数据块的长度从一个调用到另一个调用而变化。 I know protocol's packet format and know how conversation should be organized, so I can judge by current pos_ whether i have enough data to parse Nth part. 我知道协议的数据包格式,并且知道如何组织对话,因此我可以通过当前pos_判断我是否有足够的数据来解析Nth部分。 Now the implementation is like following: 现在实现如下:

void parsing_class::on_data( const char* data, size_t len )
{
   pos_ += len;
   if( pos > FIRST_PART_SIZE and !first_part_parsed_ )
     parse_first_part( data, len );
   if( pos > SECOND_PART_SIZE and !second_part_parsed_ )
     parse_second_part( data, len );
   //and so on..  
}

What I want is some tips how to optimize this algorithm. 我想要的是一些如何优化此算法的提示。 Maybe to avoid these numerous if ( on_data may be called very many times and each time it will have to go through all switches ). 也许要避免这些过多的ifon_data可能会被调用很多次,并且每次都要经过所有开关)。

You don't need all those bool and pos_ , as they seem to only keep the state of what of the conversation has passed so that you can continue with the next part. 您并不需要所有的boolpos_ ,因为它们似乎仅保持对话已通过的状态,因此您可以继续进行下一部分。

How about the following: write yourself a parse function for each of the parts of the conversation 怎么样:为对话的每个部分编写一个解析函数

bool parse_part_one(const char *data) {
    ... // parse the data
    next_fun = parse_part_two;
    return true;
}
bool parse_part_two(const char *data) {
    ... // parse the data
    next_fun = parse_part_thee;
    return true;
}
...

and in your class you add a pointer to the current parse function, starting at one. 然后在您的类中添加一个指向当前解析函数的指针(从1开始)。 Now, in on_data all you do is to call the next parse function 现在,在on_data您要做的就是调用下一个解析函数

bool success = next_fun(data);

Because each function sets the pointer to the next parse function, the next call of on_data will invoke the next parse function automatically. 由于每个函数都将指针设置为下一个解析函数,因此on_data的下一次调用将自动调用下一个解析函数。 No tests required of where in the conversation you are. 无需测试您在会话中的位置。

If the value of len is critical (which I assume it would be) then pass that along as well and return false to indicate that the part could not be parsed (don't update next_fun in that case either). 如果len的值很关键(我认为是关键值),则也将其传递给它,并返回false表示无法解析该部分(在这种情况下也不要更新next_fun )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM