简体   繁体   中英

How to extract all words of a camel cased string with a regular expression?

Assume I have a string that consists of multiple words. These words aren't separated by spaces, but every word starts with a capital letter. This type of naming convention is usually called "camel case". Some examples:

  • ApplicationRecord
  • CamelCase
  • FirstNumberAfterACharacter

Now I want to split these strings into single words, so FirstNumberAfterACharacter becomes ["First", "Number", "After", "A", "Character"] for example.

Finding a regular expression that matches those strings is also quite easy: ^([AZ][az]*)+$ . But if I try to get all matches, this regular expression will only return the last match:

irb(main):003:0> /^([A-Z][a-z]*)+$/.match('FirstNumberAfterACharacter').captures
=> ["Character"]

irb(main):004:0> 'FirstNumberAfterACharacter'.scan(/^([A-Z][a-z]*)+$/)
=> [["Character"]]

So how do I get all matches, not just the last one?

I changed your regexp to:

start with a group (...) that consists of single capital letter: [AZ]{1} , follows by zero or more capital letters [^AZ]* .

'FirstNumberAfterACharacter'.scan(/([A-Z][^A-Z]*)/).flatten(1)

You can use a regex that extract any kind of Unicode uppercase letter followed by any non-uppercase letters:

'FirstNumberAfterACharacter'.scan(/\p{Lu}\P{Lu}*/)
# => ["First", "Number", "After", "A", "Character"]

See the Ruby online demo .

Details :

  • \p{Lu} - any Unicode letter
  • \P{Lu}* - zero or more ( * ) letters other than Unicode letters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM