just trying to figure out how to plit a string by comma except when in bracket AND except when directly before and/or after the comma is a dash. I have already found some good solutions for how to deal with the bracket problem but I do not have any clue how to extend this to my problem.
Here is an example:
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
aim = ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-, Leistungsrechnung', 'Berufsausbildung, -fortbildung']
So far, I have managed to do the following:
>>> re.split(r',\s*(?![^()]*\))', example_string)
>>> out: ['A-la-carte-Küche', 'Garnieren (Speisen, Getränke)', 'Kosten-', 'Leistungsrechnung', 'Berufsausbildung', '-fortbildung']
Note the difference between aim and out for the terms 'Kosten-, Leistungsrechnung' and 'Berufsausbildung, -fortbildung'. Would be glad if someone could help me out such that the output looks like aim.
Thanks in advance!
Alex
If you can make use of the python regex module , you could do:
\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)
The pattern matches:
\([^()]*\)
Match from an opening till closing parenthesis (*SKIP)(*F)
Skip the match |
Or(?<,-)\s*?\s*(,!,)
Match a comma between optional whitespace chars to split on import regex
example_string = 'A-la-carte-Küche, Garnieren (Speisen, Getränke), Kosten-, Leistungsrechnung, Berufsausbildung, -fortbildung'
print(regex.split(r"\([^()]*\)(*SKIP)(*F)|(?<!-)\s*,\s*(?!,)", example_string))
Output
['A-la-carte-Küche', ' Garnieren (Speisen, Getränke)', ' Kosten-, Leistungsrechnung', ' Berufsausbildung', ' -fortbildung']
You can use
re.split(r'(?<!-),(?!\s*-)\s*(?![^()]*\))', example_string)
See the Python demo . Details :
(?<!-)
- a negative lookbehind that fails the match if there is a -
char immediately to the left of the current location ,
- a comma (?!\s*-)
- a negative lookahead that fails the match if there is a -
char immediately to the right of the current location \s*
- zero or more whitespaces (?![^()]*\))
- a negative lookahead that fails the match if there are zero or more chars other than )
and (
and then a )
char immediately to the right of the current location. See the regex demo , too.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.