简体   繁体   中英

regexec does not return multiple matches

Tried to learn posix regex with this example enter link description here and my own regex and text.

    const char * regex_text = "[[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2},[[:digit:]]{3}";
    const char * find_text = "00:01:54,644 --> 00:01:56,714 --> 00:02:58,589";

The output:

Trying to find '[[:digit:]]{2}\:[[:digit:]]{2}\:[[:digit:]]{2},[[:digit:]]{3}' in '00:01:54,644 --> 00:01:56,714 --> 00:02:58,589'
$& is '00:01:54,644' (bytes 0:12)
$& is '00:01:56,714' (bytes 17:29)
$& is '00:02:58,589' (bytes 34:46)
No more matches.

My question is why only one match was found in each of the for loops? And instead, the while loop did the job. Shouldn't one regexec return all matches to m ?

The for loop would catch all the capture groups within a match (groups enclosed in parentheses). So if you had written

([[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2},[[:digit:]]{3}) --> ([[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2},[[:digit:]]{3}) --> ([[:digit:]]{2}\\:[[:digit:]]{2}\\:[[:digit:]]{2},[[:digit:]]{3})

as your regex, your three timestamps would show up in $1, $2, and $3.

In your code, however, the regex matches only one timestamp. If you want to catch the next one, you need to execute a new match, which is what the while loop does.

To specifically answer the question, it is normal that a single call to regexec() only returns the first match of the regex, hence the need for an outer loop to iterate through all matches.

The confusion comes from the fact that the regmatch_t array only describes one match of the regex (is is an array because it has to contain the offsets of the match itself, and the offsets of each sub-expression within that match).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM