简体   繁体   中英

regex to split on pipe character except when in square brackets

As the title suggest, I would like to split values on the |character except when the | character is nested in brackets [|] .

For example, taking the text of:

H3609|E1.7|E1.3|D09[7|9]

where I would like to split out ["H3609", "E1.7", "E1.3", "D09[7|9]"]

So far I have tried something very basic like: [A-z0-9\.]* would get back (assuming python using re. findall() )

["H3609", "E1.7", "E1.3", "D09[7", "9]"]

any suggestions?

Thanks in advance!

You can use

re.findall(r'(?:\[[^][]*]|[^][|])+', text)

See the regex demo .

Details :

  • (?: - start of a non-capturing group that groups two patterns:
    • \[[^][]*] - a [ , then any zero or more chars other than [ and ] and then a ] char
    • | - or
    • [^][|] - any char but ] , [ and |
  • )+ - repeat matching the group patterns one or more times.

See a Python demo :

import re
text = 'H3609|E1.7|E1.3|D09[7|9]'
print( re.findall(r'(?:\[[^][]*]|[^][|])+', text) )
# => ['H3609', 'E1.7', 'E1.3', 'D09[7|9]']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM