简体   繁体   中英

Regex in std c++

I want to find all occurences of something like this '{some text}'.

My code is:

std::wregex e(L"(\\{([a-z]+)\\})");
    std::wsmatch m;


    std::regex_search(chatMessage, m, e);
    std::wcout << "matches for '" << chatMessage << "'\n";
    for (size_t i = 0; i < m.size(); ++i) {
        std::wssub_match sub_match = m[i];
        std::wstring sub_match_str = sub_match.str();
        std::wcout << i << ": " << sub_match_str << '\n';
    }  

but for string like this: L"Roses {aaa} {bbb} are {ccc} #ff0000") my output is:

0: {aaa}
1: {aaa}
2: aaa

and I dont get next substrings. I suspect that there is something wrong with my regular expression. Do anyone of you see what is wrong?

You're searching once and simply looping through the groups. You instead need to search multiple times and return the correct group only. Try:

std::wregex e(L"(\\{([a-z]+)\\})");
std::wsmatch m;

std::wcout << "matches for '" << chatMessage << "'\n";
while (std::regex_search(chatMessage, m, e))
{
    std::wssub_match sub_match = m[2];
    std::wstring sub_match_str = sub_match.str();
    std::wcout << sub_match_str << '\n';
    chatMessage = m.suffix().str(); // this advances the position in the string
}

2 here is the second group, ie the second thing in brackets, ie ([az]+) .

See this for more on groups.

There is nothing wrong with the regular expression, but you need to search for it repeatedly. And than you don't really need the parenthesis anyway.

The std::regex_search finds one occurence of the pattern. That's the {aaa} . The std::wsmatch is just that. It has 3 submatches. The whole string, the content of the outer parenthesis (which is the whole string again) and the content of the inner parenthesis. That's what you are seeing.

You have to call regex_search again on the rest of the string to get the next match:

std::wstring::const_iterator begin = chatMessage.begin(), end = chatMessage.end();
while (std::regex_search(begin, end, m, e)) {
    // ...
    begin = m.end();
}

The index operator on a regex_match object returns the matching substring at that index. When the index is 0 it returns the entire matching string, which is why the first line of output is {aaa} . When the index is 1 it returns the contents of the first capture group, that is, the text matched by the part of the regular expression that is between the first ( and the corresponding ) . In this example, those are the outermost parentheses, which once again produces {abc} . When the index is 2 is returns the contents of the second capture group, ie, the text between the second ( and its corresponding ) , which gives you the aaa .

The easiest way to search again from where you left off is to use an iterator:

std::wsregex_iterator it(chatMessage.begin(), chatMessage.end(), e);
for ( ; it != wsregex_iterator(); ++it) {
    std::cout << *it << '\n';
}

(note: this is a sketch, not tested)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM