[英]Using regex to strip all characters and punctuation from a string except apostrophe
I attempted to let this method call: 我试图让这个方法调用:
alternate_words(". . . . don’t let this stop you")
return every other word in the string, less punctuations except for '
. 返回字符串中的每个其他单词,除了
'
之外更少的标点符号。
This is the method definition: 这是方法定义:
def alternate_words(sentence)
sentence.gsub(/[^a-z0-9\s']/i, "").split(" ").delete_if.with_index
{|word,index| index.odd? }
end
The result is: 结果是:
["dont", "this", "you"]
The correct words are returned, but no '
is included. 返回正确的单词,但不包括
'
。 Changing the regex to: 将正则表达式更改为:
/[^a-z0-9\s][']/i
returns 回报
[".", ".", "don’t", "this", "you"]
Now, it correctly recognizes the apostrophe, but it incorrectly includes the periods. 现在,它正确识别撇号,但它错误地包含了句点。 I don't understand why.
我不明白为什么。
You may actually match words with apostrophes and hyphens with scan
: 实际上,您可以将带有撇号和连字符的单词与
scan
匹配 :
def alternate_words(sentence)
sentence.scan(/[[:alnum:]]+(?:[’'-][[:alnum:]]+)*/).delete_if.with_index { |_,index|
index.odd?
}
end
p alternate_words(". . . . . don’t let this stop you")
# => ["don’t", "this", "you"]
The [[:alnum:]]+(?:[''-][[:alnum:]]+)*
pattern may be enclosed with a word boundary - \\b
- if you want to only match whole word. [[:alnum:]]+(?:[''-][[:alnum:]]+)*
模式可以用单词边界括起来 - \\b
- 如果你只想匹配整个单词。
Details : 细节 :
[[:alnum:]]+
- 1 or more alphanumeric symbols [[:alnum:]]+
- 一个或多个字母数字符号 (?:[''-][[:alnum:]]+)*
- zero or more (due to *
, replace with another quantifier as per requirements) occurrences of: (?:[''-][[:alnum:]]+)*
- 零或更多(由于*
,根据要求替换为另一个量词)出现的次数:
[''-]
- an apostrophe or a hyphen (the list may be adjusted_ [''-]
- 撇号或连字符(列表可能会被调整_ [[:alnum:]]+
- 1 or more alphanumeric symbols. [[:alnum:]]+
- 一个或多个字母数字符号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.