I have strings of text in the format below.
*tag0 hi how are you tag1 where are you from tag3 i would like to eat some food*
The text is in a vector and I assigned it to a variable string line2 . I want to extract the words from each of the tag and count it as token. Below is my code.
smatch t_headermatch;
regex re("tag[0-9]+");
for (int i = 0; i < (int)boxraw.size(); ++i) {
line2 = boxraw.at(i);
while (regex_search(line2, t_headermatch, re)){
for (auto x : t_headermatch)cout << x << " ";
//If find tag header, print the words after the header and count it as token.
//repeat the process until found a new tag header.exit if no tag found
cout <<endl;
line2 = t_headermatch.suffix().str();
}
My expected ouput would be something like below:
Found 3 tag
tag0
hi token 1
how token 2
are token 3
you token 4
tag1
where 1
are 2
you 3
tag3
i 1
would 2
like 3
to 4
eat 5
some 6
food 7
Use the following regex
"tag\\d+((?:\\s+(?!tag)\\w+)+)"
Each regex_search
will return match_result
object
t_headermatch[0] : the whole match, i.e. "tag0 hi how are you"
t_headermatch[1] : the substring with tokens "hi how are you"
Also you need to split tokens and etc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.