简体   繁体   中英

Regex Group Multiple Occurrences

I have the following string to be parsed:

Field 1:Value 1
Overriden Field 2:
        Value 2.1
        Value 2.2
Field 3: 
        Value 3
Overriden Field 4:Value 4
Field 5:Value5

Basically the field-value pairs are separated by a colon, and a field (doesn't always start with "Field ...") starts at a new line followed by a colon. I want to extract the overriden field-value pairs, so I can have two (or multiple) strings: one as "Overriden Field 2:...Value 2.2" and one as "Overriden Field 4:Value 4".

I don't know how many overriden fields there are, but they all start with "Overriden". I'm not sure a grouping can help.

The best I can think of is to use re.findAll() to search for occurrences of "Overriden[^:] :[^:] :?", so I will get two results:

  • Overriden Field 2:...Field 3:
  • Overriden Field 4:...Field 5:

And then I will have to chop off the last part "\\n[^:]*:". This doesn't look smart.

Anyone would like to give some advice?

You can perhaps use something like this:

\s*([^:]+)\s*:\s*((?:[^:](?![^:\n]+:))+)\s*

[I put the \\s* just to avoid trailing spaces and/or newlines, they can be removed without changing the core content to get].

regex101 demo

The regex started as:

([^:]+):([^:]+)

Then I changed the second part to ((?:[^:](?![^:\\n]+:))+) which makes sure there isn't a : on the same line (which would mean it is going into a field on top of a value).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM