如何解析多行上的字符串对？

Question

I'd like to parse content like: 我想解析以下内容：

tag = value
tag2 = value2
tag3 = value3

with the relaxation of allowing values over multiple lines and disregarding comments of the next tag. 放宽了允许在多行上使用值，而忽略了下一个标签的注释。 A tag is identified by not starting with the comment identifier '#' and starting at a new line. 通过不以注释标识符“＃”开头并以新行开头来标识标签。 So this: 所以这：

tag = value
  value continuation
tag2 = value2
  value continuation2
# comment for tag3
tag3 = value3

should parse the mapping: 应该解析映射：

tag : "value\nvalue continuation"
tag2 : "value2\nvalue continuation2"
tag3 : "value3"

How can I achieve this in a clean way? 我怎样才能做到这一点呢？ My current code for parsing one-line pairs looks sth like this: 我当前用于解析单行对的代码如下所示：

while( std::getline( istr, line ) )
{
  ++lineCount;
  if( line[0] == '#' )
    currentComment.push_back( line );
  else if( isspace( line[0]) || line[0] == '\0' )
    currentComment.clear( );
  else
  {
    auto tag = Utils::string::splitString( line, '=' );
    if( tag.size() != 2 || line[line.size() - 1] == '=')
    {
      std::cerr << "Wrong tag syntax in line #" << lineCount << std::endl;
      return nullptr;
    }
    tagLines.push_back( line );
    currentComment.clear( );
  } 
}

Note that I don't require the results being stored in the types of containers that are currently used. 请注意，我不需要将结果存储在当前使用的容器类型中。 I can switch to anything that fits better unless I get sets of (comment, tagname, value). 除非获得（注释，标记名，值）的集合，否则我可以切换到更合适的任何东西。

Answer 1

Generally regexs add complexity to your code , but in this case it seems a regex would be the best solution. 通常，正则表达式会增加代码的复杂性，但在这种情况下，似乎正则表达式将是最佳解决方案。 A regex like this will capture the first and second parts of your pair: 这样的正则表达式将捕获您的配对的第一部分和第二部分：

(?:\s*#.*\n)*(\w+)\s*=\s*((?:[^#=\n]+(?:\n|$))+)

[ Live example ] [ 现场示例 ]

In order to use a regex_iterator on an istream you'll need to either slurp the stream or use boost::regex_iterator with the boost::match_partial flag. 为了在istream上使用regex_iterator ，您需要设置流或将boost::regex_iterator与boost::match_partial标志一起使用。 Say that istream has been slurped into string input . 假设istream已被吸引到string input 。 This code will extract the pairs: 此代码将提取对：

const regex re("(?:\\s*#.*\\n)*(\\w+)\\s*=\\s*((?:[^#=\\n]+(\\n|$))+)");

for (sregex_iterator i(input.cbegin(), input.cend(), re); i != sregex_iterator(); ++i) {
    const string tag = i->operator[](1);
    const string value = i->operator[](2);

    cout << tag << ':' << value << endl;
}

[ Live example ] [ 现场示例 ]

This obviously exceeds the request in the original question; 这显然超出了原始问题的要求； parsing out tags and values instead of just grabbing the line. 解析标签和值，而不只是抓住这条线。 There is a fair amount of functionality here that is new to C++, so if there are any questions please comment below. 这里有很多C ++的新功能，因此，如果有任何疑问，请在下面评论。

如何解析多行上的字符串对？

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-08-04 16:32:25

如何解析多行上的字符串对？

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-08-04 16:32:25

解决方案1
0 已采纳 2015-08-04 16:32:25