简体   繁体   中英

Split a string by comma except when in bracket and except when directly before and/or after the comma is a dash "-"?

just trying to figure out how to plit a string by comma except when in bracket AND except when directly before and/or after the comma is a dash. I have already found some good solutions for how to deal with the bracket problem but I do not have any clue how to extend this to my problem.

Here is an example:

example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
aim = ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-, Leistungsrechnung', 'Berufsausbildung, -fortbildung']

So far, I have managed to do the following:

>>> re.split(r',\s*(?![^()]*\))', example_string)
>>> out: ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-', 'Leistungsrechnung', 'Berufsausbildung', '-fortbildung']

Note the difference between aim and out for the terms 'Kosten-, Leistungsrechnung' and 'Berufsausbildung, -fortbildung'. Would be glad if someone could help me out such that the output looks like aim.

Thanks in advance!
Alex

If you can make use of the python regex module , you could do:

\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)

The pattern matches:

  • \([^()]*\) Match from an opening till closing parenthesis
  • (*SKIP)(*F) Skip the match
  • | Or
  • (?<,-)\s*?\s*(,!,) Match a comma between optional whitespace chars to split on

Regex demo

import regex

example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
print(regex.split(r"\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)", example_string))

Output

['A-la-carte-Küche', ' Garnieren (Speisen, Getränke)', ' Kosten-, Leistungsrechnung', ' Berufsausbildung', ' -fortbildung']

You can use

re.split(r'(?<!-),(?!\s*-)\s*(?![^()]*\))', example_string)

See the Python demo . Details :

  • (?<!-) - a negative lookbehind that fails the match if there is a - char immediately to the left of the current location
  • , - a comma
  • (?!\s*-) - a negative lookahead that fails the match if there is a - char immediately to the right of the current location
  • \s* - zero or more whitespaces
  • (?![^()]*\)) - a negative lookahead that fails the match if there are zero or more chars other than ) and ( and then a ) char immediately to the right of the current location.

See the regex demo , too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM