简体   繁体   中英

Splitting string, ignoring brackets including nested brackets

I would like to split a string at spaces (and colons), except inside curly brackets and rounded brackets. Similar questions have been asked, but the answers fail with nested brackets.

Here is an example of a string to split:

p1: I/out   p2: (('mean', 5), 0.0, ('std', 2))   p3: 7   p4: {'name': 'check', 'value': 80.0}

The actual goal is to obtain a list of keys (p1, p2, p3 and p4) along with their values. When I try to split the string at spaces and colons, I can avoid splitting at spaces and colons inside the curly brackets. But I cannot avoid the splitting at some spaces inside the rounded brackets because of the nested brackets.

The closest I got is

[\s:]+(?=[^\{\(\)\}]*(?:[\{\(]|$))

which is fine except that it splits between (('mean', 5), and 0.0 .

You can use the following PCRE/Python PyPi regex compliant pattern:

(?:(\((?:[^()]++|(?1))*\))|(\{(?:[^{}]++|(?2))*})|[^\s:])+

See the regex demo .

It matches

  • (?: - start of a container non-capturing group:
    • (\((?:[^()]++|(?1))*\)) - Group 1: a substring between two nested round brackets
    • | - or
    • (\{(?:[^{}]++|(?2))*}) - Group 2: a substring between two nested braces
    • | - or
    • [^\s:] - a char other than whitespace and colon
  • )+ - one or more occurrences.

See the Python demo :

import regex
text = "p1: I/out   p2: (('mean', 5), 0.0, ('std', 2))   p3: 7   p4: {'name': 'check', 'value': 80.0}"
pattern = r"(?:(\((?:[^()]++|(?1))*\))|(\{(?:[^{}]++|(?2))*})|[^\s:])+"
print( [x.group() for x in regex.finditer(pattern, text)] )

Output:

['p1', 'I/out', 'p2', "(('mean', 5), 0.0, ('std', 2))", 'p3', '7', 'p4', "{'name': 'check', 'value': 80.0}"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM