简体   繁体   中英

RegEx: Split String Into 1 Or More Groups

I'm using Google Sheets' REGEXEXTRACT function. The data on each row is a string that looks like this:

2020 02 15 - Joe Sephine
2020 02 18 - Candy Kane - Toot Suites - 123 Some Street
2020 02 19 - Badonk Edonk - On A Boat

I need to capture the following groups for each row:

(2020 02 15)(Joe Sephine)
(2020 02 18)(Candy Kane)(Toot Suites)(123 Some Street)
(2020 02 19)(Badonk Edonk)(On A Boat)

The delimiter is always " - " (space hyphen space), but not every row has the same number of delimiters.

Splitting a string seem like it ought to be simple but my regex skills are rudimentary at best; I've been beating my head against this for about an hour (with the help of regex101.com) but have failed to devise an expression that produces the desired output.

I'm trying variations on this:

^(?>[0-9 ]* - )(.*)( - .*)?  

But my output always either captures the first two groups but not the rest:

(2020 02 15)(Joe Sephine)
(2020 02 18) (Candy Kane)(Toot Suites)(123 Some Street)
(2020 02 19)(Badonk Edonk) (On A Boat)

Or it captures everything after the date as a single group:

(2020 02 15)( - Joe Sephine)
(2020 02 18)( - Candy Kane - Toot Suites - 123 Some Street)
(2020 02 19)( - Badonk Edonk - On A Boat)

I'm open to your suggestions

If you need to get 2, 3 or 4 groups you may use

^(.*?) - (.*?)(?: - (.*?))?(?: - (.*?))?$

See the regex demo

Details

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
  • - - a space, - , space
  • (.*?) - Group 2: any zero or more chars other than line break chars, as few as possible
  • (?: - (.*?))? - an optional non-capturing group matching 1 or 0 occurrences of
    • - - a space, - , space
    • (.*?) - Group 3: any zero or more chars other than line break chars, as few as possible
  • (?: - (.*?))? - an optional non-capturing group matching 1 or 0 occurrences of
    • - - a space, - , space
    • (.*?) - Group 4: any zero or more chars other than line break chars, as few as possible
  • $ - end of string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM