[英]Regex C++: extract substring from a string then count each word
I have strings of text in the format below. 我有以下格式的文本字符串。
*tag0 hi how are you tag1 where are you from tag3 i would like to eat some food*
The text is in a vector and I assigned it to a variable string line2 . 文本在向量中,我将其分配给变量字符串line2 。 I want to extract the words from each of the tag and count it as token.
我想从每个标签中提取单词并将其计为标记。 Below is my code.
以下是我的代码。
smatch t_headermatch;
regex re("tag[0-9]+");
for (int i = 0; i < (int)boxraw.size(); ++i) {
line2 = boxraw.at(i);
while (regex_search(line2, t_headermatch, re)){
for (auto x : t_headermatch)cout << x << " ";
//If find tag header, print the words after the header and count it as token.
//repeat the process until found a new tag header.exit if no tag found
cout <<endl;
line2 = t_headermatch.suffix().str();
}
My expected ouput would be something like below: 我的预期输出将如下所示:
Found 3 tag
tag0
hi token 1
how token 2
are token 3
you token 4
tag1
where 1
are 2
you 3
tag3
i 1
would 2
like 3
to 4
eat 5
some 6
food 7
Use the following regex 使用以下正则表达式
"tag\\d+((?:\\s+(?!tag)\\w+)+)"
Each regex_search
will return match_result
object 每个
regex_search
都将返回match_result
对象
t_headermatch[0] : the whole match, i.e. "tag0 hi how are you"
t_headermatch[1] : the substring with tokens "hi how are you"
Also you need to split tokens and etc. 你还需要拆分令牌等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.