简体   繁体   English

Regex C ++:从字符串中提取子字符串然后计算每个单词

[英]Regex C++: extract substring from a string then count each word

I have strings of text in the format below. 我有以下格式的文本字符串。

*tag0 hi how are you tag1 where are you from tag3 i would like to eat some food*

The text is in a vector and I assigned it to a variable string line2 . 文本在向量中,我将其分配给变量字符串line2 I want to extract the words from each of the tag and count it as token. 我想从每个标签中提取单词并将其计为标记。 Below is my code. 以下是我的代码。

smatch t_headermatch;
regex re("tag[0-9]+");

for (int i = 0; i < (int)boxraw.size(); ++i) {          
    line2 = boxraw.at(i); 

while (regex_search(line2, t_headermatch, re)){
        for (auto x : t_headermatch)cout << x << " ";

//If find tag header, print the words after the header and count it as token.

//repeat the process until found a new tag header.exit if no tag found


        cout <<endl;
        line2 = t_headermatch.suffix().str();
    }

My expected ouput would be something like below: 我的预期输出将如下所示:

Found 3 tag

tag0
hi token 1
how token 2
are token 3
you token 4
tag1
where 1 
are  2 
you 3
tag3 
i 1
would 2
like 3
to 4
eat 5
some 6
food 7

Use the following regex 使用以下正则表达式

"tag\\d+((?:\\s+(?!tag)\\w+)+)"

Each regex_search will return match_result object 每个regex_search都将返回match_result对象

t_headermatch[0] : the whole match, i.e. "tag0 hi how are you"
t_headermatch[1] : the substring with tokens "hi how are you"

Also you need to split tokens and etc. 你还需要拆分令牌等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM