简体   繁体   English

如何在C ++中从正则表达式提取零件?

[英]How to extract parts from regex in C++?

How to extract parts from regex in C++? 如何在C ++中从正则表达式提取零件?

For example I have patterns like this: 例如,我有这样的模式:

new line means "followed by"

delimiter string,
name,
':' character,
list of Xs, where X is name; (string followed by ';' character)

I can use regex for matching, but is there a way to not only match, but also extract parts from the pattern? 我可以使用正则表达式进行匹配,但有没有办法不仅可以匹配,还可以从模式中提取零件? For example: 例如:

$DatasetName: A; B; C;

is a given string, and I would like to extract the dataset name, and then the column names A, B, and C. 是给定的字符串,我想提取数据集名称,然后提取列名称A,B和C。

Well, as already suggested you could do by hand parsing similar to this (it is only for demonstration purposes and does not claim to be perfect): 好了,正如已经建议的那样,您可以通过手工解析类似于此的解析(它仅用于演示目的,并不声称是完美的):

#include <iostream>
#include <vector>
#include <string>

bool parse_by_hand(const std::string& phrase)
{
    enum parse_state
    {
        parse_name,
        parse_value,
    };
    std::string name, current_value;
    std::vector<std::string> values;
    parse_state state = parse_name;
    for(std::string::const_iterator iterator = phrase.begin(); iterator != phrase.end(); iterator++)
    {
        switch(state)
        {
        case parse_name:
            if(*iterator != ':')
                name += *iterator;
            else 
                state = parse_value;
            break;
        case parse_value:
            if(*iterator != ';')
                current_value += *iterator;
            else 
            {
                state = parse_value;
                values.push_back(current_value);
                current_value.clear();
            }
            break;
        default:
            return false;
        }
    }
    // Error checking here, name parsed? values parsed?
    return true;
}

int main(int argc, char** argv)
{
    std::string phrase("$DatasetName: A; B; C;");
    parse_by_hand(phrase);
}

As for the std::regex , my first shot was for something like this ([^:]*):(([^;]*);)* but unless I'm not mistaken (and I hope someone corrects me if I am), the recursive capture group will give you the last matched value not all values so you would still have to do multiple iterations with regex_search which takes away the ease of 'one-liner-regex-matching' off the table. 至于std::regex ,我的第一枪就是这样的([^:]*):(([^;]*);)*但除非我没有记错(我希望有人纠正我)我是),递归捕获组将为您提供最后匹配的值,而不是所有值,因此您仍然需要使用regex_search进行多次迭代,这消除了表格中“单线-正则表达式匹配”的麻烦。 Alternatively if std::regex is not a must and you can use Boost, take a look at Repeated captures , this should solve the capture group issue. 另外,如果std::regex不是必须的,并且您可以使用Boost,请查看Repeated captures ,这应该可以解决捕获组问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM