C ++ Boost :: regex帮助

Question

I'm trying to get all words inside a string using Boost::regex in C++. 我正在尝试使用C ++中的Boost :: regex获取字符串中的所有单词。

Here's my input : 这是我的意见：

"Hello there | network - bla bla hoho" “您好|网络-bla bla hoho”

using this code : 使用此代码：

      regex rgx("[a-z]+",boost::regex::perl|boost::regex::icase);

      regex_search(input, result, rgx);

       for(unsigned int j=0; j<result.size(); ++j)
       {
         cout << result[j] << endl;
       }

I only get the first word "Hello".. whats wrong with my code ? 我只得到第一个单词“ Hello”。我的代码有什么问题？ result.size() returns 1. result.size（）返回1。

thank you. 谢谢。

Answer 1

regex_search only finds the first match. regex_search只找到第一个匹配项。 To iterate over all matches, use regex_iterator 要遍历所有匹配项，请使用regex_iterator

Answer 2

Try rgx("(?:(\\\\w+)\\\\W+)+"); 尝试rgx("(?:(\\\\w+)\\\\W+)+"); as your regex. 作为您的正则表达式。 ( ?: will start a non-marking group which is finished by the matching )+ which will match the words in the string 1 or more times (\\\\w+) will match alpha, digits and underscores 1 or more times as a marked group, ie typical word like characters which are returned to you in result[i] \\\\W+ will match one or more contiguous non-word characters, ie whitespace, |, - etc. （ ?:将开始一个非标记组，由匹配项结束)+将匹配字符串中的单词1次或更多次(\\\\w+)将匹配alpha，数字和下划线1次或更多次作为标记组，即在result [i] \\\\W+中返回给您的典型单词（如字符）将匹配一个或多个连续的非单词字符，即空格，|，-等。

Answer 3

You're only searching for alphabetic characters, not spaces, pipes or hyphens. 您仅在搜索字母字符，而不是空格，竖线或连字符。 regex_search() probably just returns the first match. regex_search()可能仅返回第一个匹配项。

Answer 4

You would need to capture any set of [az]+ (or some other regex for matching "words") bound by spaces or string boundaries. 您将需要捕获由空格或字符串边界限制的任何一组[az]+ （或其他用于匹配“单词”的正则表达式）。 You could try something like this: 您可以尝试这样的事情：

^(\s*.+\s*)+$

In any event, this isn't really a boost::regex problem, it's just a regex problem. 无论如何，这并不是一个真正的boost :: regex问题，而仅仅是一个regex问题。 use perl or the bash shell (or any number of web tools) to get your regex figured out, then use in your code. 使用perl或bash shell（或任何数量的Web工具）确定正则表达式，然后在代码中使用。

Answer 5

也许您可以尝试使用以下正则表达式"(?:([az]+)\\\\b\\\\s*)+" 重复捕获。

Answer 6

To match words, try this regex: 要匹配单词，请尝试以下正则表达式：

regex rgx("\\<[a-z]+\\>",boost::regex::perl|boost::regex::icase);

According to the docs, \\< denotes the start of a word and \\> denotes the end of a word in the Perl variety of Boost regex matching. 根据文档，在Perl各种Boost regex匹配中， \\<表示单词的开头， \\>表示单词的结尾。

I'm afraid someone else has to explain how to iterate the matches. 恐怕其他人必须解释如何重复比赛。 The Boost documentation makes my brain hurt. Boost文档使我的大脑受伤。

C ++ Boost :: regex帮助

问题描述

6 个解决方案

解决方案1
5 已采纳 2010-04-07 13:45:14

解决方案2
1 2011-05-20 05:10:50

解决方案3
0 2010-04-07 13:42:02

解决方案4
0 2010-04-07 13:42:40

解决方案5
0 2010-04-07 13:42:47

解决方案6
0 2010-04-07 13:49:54

C ++ Boost :: regex帮助

问题描述

6 个解决方案

解决方案1 5 已采纳 2010-04-07 13:45:14

解决方案2 1 2011-05-20 05:10:50

解决方案3 0 2010-04-07 13:42:02

解决方案4 0 2010-04-07 13:42:40

解决方案5 0 2010-04-07 13:42:47

解决方案6 0 2010-04-07 13:49:54

解决方案1
5 已采纳 2010-04-07 13:45:14

解决方案2
1 2011-05-20 05:10:50

解决方案3
0 2010-04-07 13:42:02

解决方案4
0 2010-04-07 13:42:40

解决方案5
0 2010-04-07 13:42:47

解决方案6
0 2010-04-07 13:49:54