[英]How to parse string pairs over multiple lines?
I'd like to parse content like: 我想解析以下内容:
tag = value
tag2 = value2
tag3 = value3
with the relaxation of allowing values over multiple lines and disregarding comments of the next tag. 放宽了允许在多行上使用值,而忽略了下一个标签的注释。 A tag is identified by not starting with the comment identifier '#' and starting at a new line. 通过不以注释标识符“#”开头并以新行开头来标识标签。 So this: 所以这:
tag = value
value continuation
tag2 = value2
value continuation2
# comment for tag3
tag3 = value3
should parse the mapping: 应该解析映射:
tag : "value\nvalue continuation"
tag2 : "value2\nvalue continuation2"
tag3 : "value3"
How can I achieve this in a clean way? 我怎样才能做到这一点呢? My current code for parsing one-line pairs looks sth like this: 我当前用于解析单行对的代码如下所示:
while( std::getline( istr, line ) )
{
++lineCount;
if( line[0] == '#' )
currentComment.push_back( line );
else if( isspace( line[0]) || line[0] == '\0' )
currentComment.clear( );
else
{
auto tag = Utils::string::splitString( line, '=' );
if( tag.size() != 2 || line[line.size() - 1] == '=')
{
std::cerr << "Wrong tag syntax in line #" << lineCount << std::endl;
return nullptr;
}
tagLines.push_back( line );
currentComment.clear( );
}
}
Note that I don't require the results being stored in the types of containers that are currently used. 请注意,我不需要将结果存储在当前使用的容器类型中。 I can switch to anything that fits better unless I get sets of (comment, tagname, value). 除非获得(注释,标记名,值)的集合,否则我可以切换到更合适的任何东西。
Generally regexs add complexity to your code , but in this case it seems a regex would be the best solution. 通常,正则表达式会增加代码的复杂性 ,但在这种情况下,似乎正则表达式将是最佳解决方案。 A regex like this will capture the first and second parts of your pair: 这样的正则表达式将捕获您的配对的第一部分和第二部分:
(?:\s*#.*\n)*(\w+)\s*=\s*((?:[^#=\n]+(?:\n|$))+)
[ Live example ] [ 现场示例 ]
In order to use a regex_iterator
on an istream
you'll need to either slurp the stream or use boost::regex_iterator
with the boost::match_partial
flag. 为了在istream
上使用regex_iterator
,您需要设置流或将boost::regex_iterator
与boost::match_partial
标志一起使用。 Say that istream
has been slurped into string input
. 假设istream
已被吸引到string input
。 This code will extract the pairs: 此代码将提取对:
const regex re("(?:\\s*#.*\\n)*(\\w+)\\s*=\\s*((?:[^#=\\n]+(\\n|$))+)");
for (sregex_iterator i(input.cbegin(), input.cend(), re); i != sregex_iterator(); ++i) {
const string tag = i->operator[](1);
const string value = i->operator[](2);
cout << tag << ':' << value << endl;
}
[ Live example ] [ 现场示例 ]
This obviously exceeds the request in the original question; 这显然超出了原始问题的要求; parsing out tags and values instead of just grabbing the line. 解析标签和值,而不只是抓住这条线。 There is a fair amount of functionality here that is new to C++, so if there are any questions please comment below. 这里有很多C ++的新功能,因此,如果有任何疑问,请在下面评论。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.