How to extract all words of a camel cased string with a regular expression?

Question

Assume I have a string that consists of multiple words. These words aren't separated by spaces, but every word starts with a capital letter. This type of naming convention is usually called "camel case". Some examples:

ApplicationRecord
CamelCase
FirstNumberAfterACharacter

Now I want to split these strings into single words, so FirstNumberAfterACharacter becomes ["First", "Number", "After", "A", "Character"] for example.

Finding a regular expression that matches those strings is also quite easy: ^([AZ][az]*)+$ . But if I try to get all matches, this regular expression will only return the last match:

irb(main):003:0> /^([A-Z][a-z]*)+$/.match('FirstNumberAfterACharacter').captures
=> ["Character"]

irb(main):004:0> 'FirstNumberAfterACharacter'.scan(/^([A-Z][a-z]*)+$/)
=> [["Character"]]

So how do I get all matches, not just the last one?

Answer 1

I changed your regexp to:

start with a group (...) that consists of single capital letter: [AZ]{1} , follows by zero or more capital letters [^AZ]* .

'FirstNumberAfterACharacter'.scan(/([A-Z][^A-Z]*)/).flatten(1)

Answer 2

You can use a regex that extract any kind of Unicode uppercase letter followed by any non-uppercase letters:

'FirstNumberAfterACharacter'.scan(/\p{Lu}\P{Lu}*/)
# => ["First", "Number", "After", "A", "Character"]

See the Ruby online demo .

Details :

\p{Lu} - any Unicode letter
\P{Lu}* - zero or more ( * ) letters other than Unicode letters.

How to extract all words of a camel cased string with a regular expression?

Question

2 answers

solution1
0 2020-03-23 14:52:22

solution2
0 2022-04-18 13:58:24

How to extract all words of a camel cased string with a regular expression?

Question

2 answers

solution1 0 2020-03-23 14:52:22

solution2 0 2022-04-18 13:58:24

solution1
0 2020-03-23 14:52:22

solution2
0 2022-04-18 13:58:24