简体   繁体   中英

Regex to split a string except in double quotes and keep the delimiters

I have been sifting to very similar questions but I am still stumped. I need to split a string by any non alphanumeric character and keep the delimiters except for parts of the string in double quotes. Hence, for:

string = 'let a = 5 * (other) if x is "constant";'
re.split(pattern, "string")

should yield:

['let', 'a', '=', '5', '*', '(', 'other', '),' 'if', 'x' 'is', '"constant"', ';']

I am getting pretty close with:

re.split(r"(\W)", fragment)

(except for whitespace that I filter out separately) but I cannot manage the double quotes.

Any help appreciated.

You can use

import re
s = 'let a = 5 * (other) if x is "constant";'
print( re.findall(r'"[^"]*"|\w+|[^\w\s]', s) )

See the Python demo and the regex demo .

Details :

  • "[^"]*" - a " , zero or more chars other than " and then a "
  • | - or
  • \w+ - one or more word chars
  • | - or
  • [^\w\s] - a char other than a word and whitespace char.
re.split(r'[ ]|(?<=[(])|(?=[);])', string)

['let', 'a', '=', '5', '*', '(', 'other', ')', 'if', 'x', 'is', '"constant"', ';']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM