[英]Help with C++ Boost::regex
I'm trying to get all words inside a string using Boost::regex in C++. 我正在尝试使用C ++中的Boost :: regex获取字符串中的所有单词。
Here's my input : 这是我的意见:
"Hello there | network - bla bla hoho" “您好|网络-bla bla hoho”
using this code : 使用此代码:
regex rgx("[a-z]+",boost::regex::perl|boost::regex::icase);
regex_search(input, result, rgx);
for(unsigned int j=0; j<result.size(); ++j)
{
cout << result[j] << endl;
}
I only get the first word "Hello".. whats wrong with my code ? 我只得到第一个单词“ Hello”。我的代码有什么问题? result.size() returns 1.
result.size()返回1。
thank you. 谢谢。
regex_search only finds the first match. regex_search只找到第一个匹配项。 To iterate over all matches, use regex_iterator
要遍历所有匹配项,请使用regex_iterator
Try rgx("(?:(\\\\w+)\\\\W+)+");
尝试
rgx("(?:(\\\\w+)\\\\W+)+");
as your regex. 作为您的正则表达式。 (
?:
will start a non-marking group which is finished by the matching )+
which will match the words in the string 1 or more times (\\\\w+)
will match alpha, digits and underscores 1 or more times as a marked group, ie typical word like characters which are returned to you in result[i] \\\\W+
will match one or more contiguous non-word characters, ie whitespace, |, - etc. (
?:
将开始一个非标记组,由匹配项结束)+
将匹配字符串中的单词1次或更多次(\\\\w+)
将匹配alpha,数字和下划线1次或更多次作为标记组,即在result [i] \\\\W+
中返回给您的典型单词(如字符)将匹配一个或多个连续的非单词字符,即空格,|,-等。
You're only searching for alphabetic characters, not spaces, pipes or hyphens. 您仅在搜索字母字符,而不是空格,竖线或连字符。
regex_search()
probably just returns the first match. regex_search()
可能仅返回第一个匹配项。
You would need to capture any set of [az]+
(or some other regex for matching "words") bound by spaces or string boundaries. 您将需要捕获由空格或字符串边界限制的任何一组
[az]+
(或其他用于匹配“单词”的正则表达式)。 You could try something like this: 您可以尝试这样的事情:
^(\s*.+\s*)+$
In any event, this isn't really a boost::regex problem, it's just a regex problem. 无论如何,这并不是一个真正的boost :: regex问题,而仅仅是一个regex问题。 use perl or the bash shell (or any number of web tools) to get your regex figured out, then use in your code.
使用perl或bash shell(或任何数量的Web工具)确定正则表达式,然后在代码中使用。
也许您可以尝试使用以下正则表达式"(?:([az]+)\\\\b\\\\s*)+"
重复捕获 。
To match words, try this regex: 要匹配单词,请尝试以下正则表达式:
regex rgx("\\<[a-z]+\\>",boost::regex::perl|boost::regex::icase);
According to the docs, \\<
denotes the start of a word and \\>
denotes the end of a word in the Perl variety of Boost regex matching. 根据文档,在Perl各种Boost regex匹配中,
\\<
表示单词的开头, \\>
表示单词的结尾。
I'm afraid someone else has to explain how to iterate the matches. 恐怕其他人必须解释如何重复比赛。 The Boost documentation makes my brain hurt.
Boost文档使我的大脑受伤。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.