简体   繁体   中英

C++ Regex: non-greedy match

I'm currently trying to make a regex which matches URL parameters and extracts them.

For example, if I got the following parameters string ?param1=someValue&param2=someOtherValue , std::regex_match should extract the following contents:

  • param1
  • some_content
  • param2
  • some_other_content

After trying different regex patterns, I finally built one corresponding to what I want: std::regex("(?:[\\\\?&]([^=&]+)=([^=&]+))*") .

If I take the previous example, std::regex_match matches as expected. However, it does not extract the expected values, keeping only the last captured values.

For example, the following code:

std::regex paramsRegex("(?:[\\?&]([^=&]+)=([^=&]+))*");
std::string arg = "?param1=someValue&param2=someOtherValue";
std::smatch sm;

std::regex_match(arg, sm, paramsRegex);
for (const auto &match : sm)
   std::cout << match << std::endl;

will give the following output:

param2
someOtherValue

As you can see, param1 and its value are skipped and not captured.

After searching on google, I've found that this is due to greedy capture and I have modified my regex into "(?:[\\\\?&]([^=&]+)=([^=&]+))\\\\*?" in order to enable non-greedy capturing.

This regex works well when I try it on rubular but it does not match when I use it in C++ ( std::regex_match returns false and nothing is captured).

I've tried different std::regex_constants options (different regex grammar by using std::regex_constants::grep , std::regex_constants::egrep , ...) but the result is the same.

Does someone know how to do non-greedy regex capture in C++?

As Casimir et Hippolyte explained in his comment , I just need to:

  • remove the quantifier
  • Use std::regex_iterator

It gives me the following code:

std::regex paramsRegex("[\\?&]([^=]+)=([^&]+)");
std::string url_params = "?key1=val1&key2=val2&key3=val3&key4=val4";
std::smatch sm;

auto params_it = std::sregex_iterator(url_params.cbegin(), url_params.cend(), paramsRegex);
auto params_end = std::sregex_iterator();

while (params_it != params_end) {
    auto param = params_it->str();

    std::regex_match(param, sm, paramsRegex);
    for (const auto &s : sm)
       std::cout << s << std::endl;

    ++params_it;
}

And here is the output:

?key1=val1
key1
val1
&key2=val2
key2
val2
&key3=val3
key3
val3
&key4=val4
key4
val4

The orignal regex (?:[\\\\?&]([^=&]+)=([^=&]+))* was just changed into [\\\\?&]([^=]+)=([^&]+) .

Then, by using std::sregex_iterator , I get an iterator on each matching groups ( ?key1=val1 , &key2=val2 , ...).

Finally, by calling std::regex_match on each sub-string, I can retrieve parameters values.

Try to use match_results::prefix/suffix:

string match_expression("your expression");
smatch result;
regex fnd(match_expression, regex_constants::icase);
while (regex_search(in_str, result, fnd, std::regex_constants::match_any)) 
{
    for (size_t i = 1; i < result.size(); i++)
    {           
        std::cout << result[i].str();
    }
    in_str = result.suffix();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM