Ruby正则表达式提取单词

Question

我目前正在努力想出一个正则表达式，它可以将一个字符串拆分成单词，其中单词被定义为由空格包围的字符序列，或者用双引号括起来。 我正在使用String#scan

例如，字符串：

'   hello "my name" is    "Tom"'

应该匹配的话：

hello
my name
is
Tom

我设法匹配双引号括起来的单词：

/"([^\"]*)"/

但是我无法弄清楚如何将空白字符包围起来以获得'你好'，'是'和'汤姆'，同时又不会搞砸'我的名字'。

任何帮助都将不胜感激！

Answer 1

result = '   hello "my name" is    "Tom"'.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/)

会为你工作。 它会打印出来

=> ["", "hello", "\"my name\"", "is", "\"Tom\""]

只需忽略空字符串。

说明

"
\\s            # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   +             # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?=           # Assert that the regex below can be matched, starting at this position (positive lookahead)
   (?:           # Match the regular expression below
      [^\"]          # Match any character that is NOT a “\"”
         *             # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
      \"             # Match the character “\"” literally
      [^\"]          # Match any character that is NOT a “\"”
         *             # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
      \"             # Match the character “\"” literally
   )*            # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   [^\"]          # Match any character that is NOT a “\"”
      *             # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   \$             # Assert position at the end of a line (at the end of the string or before a line break character)
)
"

您可以使用这样的reject来避免空字符串

result = '   hello "my name" is    "Tom"'
            .split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/).reject {|s| s.empty?}

版画

=> ["hello", "\"my name\"", "is", "\"Tom\""]

Answer 2

text = '   hello "my name" is    "Tom"'

text.scan(/\s*("([^"]+)"|\w+)\s*/).each {|match| puts match[1] || match[0]}

生产：

hello
my name
is
Tom

说明：

0个或更多空格后跟

或

双引号中的一些单词OR

一个字

然后是0或更多的空格

Answer 3

你可以尝试这个正则表达式：

/\b(\w+)\b/

它使用\\b来查找单词边界。 这个网站http://rubular.com/很有帮助。

Ruby正则表达式提取单词

问题描述

3 个解决方案

解决方案1
22 已采纳 2011-11-17 05:27:58

解决方案2
4 2011-11-17 05:36:49

解决方案3
1 2012-07-30 13:44:45

Ruby正则表达式提取单词

问题描述

3 个解决方案

解决方案1 22 已采纳 2011-11-17 05:27:58

解决方案2 4 2011-11-17 05:36:49

解决方案3 1 2012-07-30 13:44:45

解决方案1
22 已采纳 2011-11-17 05:27:58

解决方案2
4 2011-11-17 05:36:49

解决方案3
1 2012-07-30 13:44:45