红宝石字符串分割成多个字符

Question

我有一个字符串，说“我正在学习的Hello_World，Ruby”。 我想将此字符串分成每个不同的词，最好的方法是什么？

谢谢！ C。

Answer 1

您可以将\\ W用于任何非单词字符：

"Hello_World I am Learning,Ruby".split /[\W_]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

"Hello_World I am Learning,   Ruby".split /[\W_]+/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 2

您可以使用带有正则表达式模式的String.split作为参数。 像这样：

"Hello_World I am Learning,Ruby".split /[ _,.!?]/
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 3

ruby-1.9.2-p290 :022 > str =  "Hello_World I am Learning,Ruby"
ruby-1.9.2-p290 :023 > str.split(/\s|,|_/)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

Answer 4

String＃Scan似乎是完成此任务的合适方法

irb(main):018:0> "Hello_World    I am Learning,Ruby".scan(/[a-z]+/i)
=> ["Hello", "World", "I", "am", "Learning", "Ruby"]

或者您可以使用内置的匹配器\\w

irb(main):020:0> "Hello_World    I am Learning,Ruby".scan(/\w+/)
=> ["Hello_World", "I", "am", "Learning", "Ruby"]

Answer 5

尽管上面的示例有效，但我认为将字符串拆分为单词以拆分不视为任何类型单词的字符可能会更好。 为此，我这样做：

str =  "Hello_World I am Learning,Ruby"
str.split(/[^a-zA-Z]/).reject(&:empty?).compact

该语句执行以下操作：

按字母以外的字符分割字符串
然后拒绝任何空字符串
并从数组中删除所有空值

然后它将处理单词的大多数组合。 上面的示例要求您列出要与之匹配的所有字符。 指定不属于单词的字符要容易得多。

Answer 6

只是为了好玩，一个支持Unicode的版本1.9（或者在Oniguruma中是1.8）：

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|\p{Connector_Punctuation}/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

或者可能：

>> "This_µstring has words.and thing's".split(/[^\p{Word}']|_/)
=> ["This", "µstring", "has", "words", "and", "thing's"]

真正的问题是确定在这种情况下哪些字符序列构成一个“单词”。 您可能需要查看Oniguruma文档中所支持的字符属性， Wikipedia也对该属性进行了一些注释。

红宝石字符串分割成多个字符

问题描述

6 个解决方案

解决方案1
5 2011-10-11 10:45:57

解决方案2
2 2011-10-11 09:50:14

解决方案3
1 2011-10-11 09:53:45

解决方案4
1 2011-10-11 10:34:45

解决方案5
0 2011-10-11 10:14:52

解决方案6
0 2011-10-11 16:40:06

红宝石字符串分割成多个字符

问题描述

6 个解决方案

解决方案1 5 2011-10-11 10:45:57

解决方案2 2 2011-10-11 09:50:14

解决方案3 1 2011-10-11 09:53:45

解决方案4 1 2011-10-11 10:34:45

解决方案5 0 2011-10-11 10:14:52

解决方案6 0 2011-10-11 16:40:06

解决方案1
5 2011-10-11 10:45:57

解决方案2
2 2011-10-11 09:50:14

解决方案3
1 2011-10-11 09:53:45

解决方案4
1 2011-10-11 10:34:45

解决方案5
0 2011-10-11 10:14:52

解决方案6
0 2011-10-11 16:40:06