简体   繁体   English

Ruby正则表达式提取字符串中不包含空格的单词

[英]Ruby regular expression to extract words in a string that contain no spaces

Say I have the string str = "ASimpleNoSpaceTitle" . 说我有字符串str = "ASimpleNoSpaceTitle" I can't seem to wrap my head around how to use regexp to split and extract all the capitalized words so that I get ["A", "Simple", "No", "Space", "Title"] . 我似乎无法全神贯注于如何使用正则表达式来拆分和提取所有大写单词,以便获得["A", "Simple", "No", "Space", "Title"]

What's a regular expression that will do the job? 什么正则表达式可以胜任?

UPDATE: What about a string of words with and without spaces/upper-case? 更新:带有或不带有空格/大写字母的单词字符串怎么办? Like "ASimpleNoSpaceTitle and a subtitle" to ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"] 就像"ASimpleNoSpaceTitle and a subtitle" ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"] "ASimpleNoSpaceTitle and a subtitle" ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]

Using String#scan with character class ranges will get you what you want with a simple, easy-to-understand regex: String#scan字符类范围一起使用将通过简单,易于理解的正则表达式为您提供所需的东西:

str = "ASimpleNoSpaceTitle"
str.scan(/[A-Z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title"]

You could use the POSIX bracket expressions [[:upper:]] and [[:lower:]] , which would allow your regex to also deal with non-ASCII letters such as À or ç: 您可以使用POSIX括号表达式[[:upper:]][[:lower:]] ,这将使您的正则表达式也可以处理非ASCII字母,例如À或ç:

str = "ÀSimpleNoSpaçeTitle"
str.scan(/[A-Z][a-z]*/) # => ["Simple", "No", "Spa", "Title"]
str.scan(/[[:upper:]][[:lower:]]*/) # => ["À", "Simple", "No", "Spaçe", "Title"]

To allow words to begin with a lowercase letter when not preceded by another letter, you can use this varuation: 要允许单词以小写字母开头而不是另一个字母开头,可以使用以下变体形式:

str = "ASimpleNoSpaceTitle and a subtitle"
str.scan(/[A-Za-z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]
# OR
str.scan(/[[:alpha:]][[:lower:]]*/)
"ABSimpleNoSpaceTitle".split(/(?=[[:upper:]])/)
  #=> ["A", "B", "Simple", "No", "Space", "Title"]

(?=[[:upper:]]) in a positive lookahead, requiring the match to be followed by a capital letter. (?=[[:upper:]])进行正向的前瞻,要求匹配项后跟一个大写字母。

The correct way to do this in 2016 is: 在2016年执行此操作的正确方法是:

"ASimpleNoSpaceTitle and a subtitle".split(/(?=\p{Lu})|\s+/)
#⇒ ["A","Simple","No","Space","Title","and","a","subtitle"]

Here is one way to do it. 这是一种方法。

pass this regex inside the built in scan() method. 在内置scan()方法中传递此正则表达式。

regext /[[:upper:]](?:[[:lower:]]+)?/ regext /[[:upper:]](?:[[:lower:]]+)?/ ::: /[[:upper:]](?:[[:lower:]]+)?/ ::: /[[:upper:]](?:[[:lower:]]+)?/

All the regex does is find an upper case letter [[:upper:]] that is optionally followed by a lower case letter (?:[[:lower:]]+)? 正则表达式所做的就是找到大写字母[[:upper:]] ,然后可选地跟一个小写字母(?:[[:lower:]]+)? .

scan will look for more than one occurrence of the match string/char..etc 扫描将查找多个匹配字符串/字符..etc

irb(main):001:0> str = "ASimpleNoSpaceTitle"
=> "ASimpleNoSpaceTitle"

irb(main):050:0> str.scan(/[[:upper:]](?:[[:lower:]]+)?/)
=> ["A", "Simple", "No", "Space", "Title"]
irb(main):051:0> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM