简体   繁体   English

如何解析多行上的字符串对?

[英]How to parse string pairs over multiple lines?

I'd like to parse content like: 我想解析以下内容:

tag = value
tag2 = value2
tag3 = value3

with the relaxation of allowing values over multiple lines and disregarding comments of the next tag. 放宽了允许在多行上使用值,而忽略了下一个标签的注释。 A tag is identified by not starting with the comment identifier '#' and starting at a new line. 通过不以注释标识符“#”开头并以新行开头来标识标签。 So this: 所以这:

tag = value
  value continuation
tag2 = value2
  value continuation2
# comment for tag3
tag3 = value3

should parse the mapping: 应该解析映射:

tag : "value\nvalue continuation"
tag2 : "value2\nvalue continuation2"
tag3 : "value3"

How can I achieve this in a clean way? 我怎样才能做到这一点呢? My current code for parsing one-line pairs looks sth like this: 我当前用于解析单行对的代码如下所示:

while( std::getline( istr, line ) )
{
  ++lineCount;
  if( line[0] == '#' )
    currentComment.push_back( line );
  else if( isspace( line[0]) || line[0] == '\0' )
    currentComment.clear( );
  else
  {
    auto tag = Utils::string::splitString( line, '=' );
    if( tag.size() != 2 || line[line.size() - 1] == '=')
    {
      std::cerr << "Wrong tag syntax in line #" << lineCount << std::endl;
      return nullptr;
    }
    tagLines.push_back( line );
    currentComment.clear( );
  } 
}

Note that I don't require the results being stored in the types of containers that are currently used. 请注意,我不需要将结果存储在当前使用的容器类型中。 I can switch to anything that fits better unless I get sets of (comment, tagname, value). 除非获得(注释,标记名,值)的集合,否则我可以切换到更合适的任何东西。

Generally regexs add complexity to your code , but in this case it seems a regex would be the best solution. 通常,正则表达式会增加代码的复杂性 ,但在这种情况下,似乎正则表达式将是最佳解决方案。 A regex like this will capture the first and second parts of your pair: 这样的正则表达式将捕获您的配对的第一部分和第二部分:

(?:\s*#.*\n)*(\w+)\s*=\s*((?:[^#=\n]+(?:\n|$))+)

[ Live example ] [ 现场示例 ]

In order to use a regex_iterator on an istream you'll need to either slurp the stream or use boost::regex_iterator with the boost::match_partial flag. 为了在istream上使用regex_iterator ,您需要设置流或将boost::regex_iteratorboost::match_partial标志一起使用。 Say that istream has been slurped into string input . 假设istream已被吸引到string input This code will extract the pairs: 此代码将提取对:

const regex re("(?:\\s*#.*\\n)*(\\w+)\\s*=\\s*((?:[^#=\\n]+(\\n|$))+)");

for (sregex_iterator i(input.cbegin(), input.cend(), re); i != sregex_iterator(); ++i) {
    const string tag = i->operator[](1);
    const string value = i->operator[](2);

    cout << tag << ':' << value << endl;
}

[ Live example ] [ 现场示例 ]

This obviously exceeds the request in the original question; 这显然超出了原始问题的要求; parsing out tags and values instead of just grabbing the line. 解析标签和值,而不只是抓住这条线。 There is a fair amount of functionality here that is new to C++, so if there are any questions please comment below. 这里有很多C ++的新功能,因此,如果有任何疑问,请在下面评论。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM