简体   繁体   中英

Regular Expression to Match All Enclosed '' (2 Single Quotes)

I am looking for a regex that will provide me with capture groups for each set of 2 single quotes ( '' ) within the single-quoted strings ( 'string' ) that are part of a comma-separated list. For instance the string 'tom''s' would have a single group between the m and the s . I've come close but keep getting tripped up by either erroneously matching up with the enclosing single quotes or with only capturing some of the 2 single quotes within a string.

Example Input

'11','22'',','''33','44''','''55''','6''''6'

Desired Groups (7, shown in parens)

 '11','22(''),','('')33','44('')','('')55('')','6('')('')6'

For context, what I'm ultimately attempting to do is replace these 2 single quotes within the comma-separated sequence of strings with another value in order to make subsequent parsing easier.

Note also that commas may be contained within the single quoted strings.

You cannot match the double single quotes like that with Python re module. You can just match the single-quoted entries and capture the inner part of each entry, and using a lambda, replace the '' inside with a mere .replace :

import re
p = re.compile(r"'([^']*(?:''[^']*)*)'")
test_str = "'11','22'',','''33','44''','''55''','6''''6'"
print(p.sub(lambda m: "'{}'".format(m.group(1).replace("''", "&")), test_str))

See IDEONE demo , output: '11','22&,','&33','44&','&55&','6&&6'

The regex is '([^']*(?:''[^']*)*)' :

  • ' - opening '
  • ( - Capture group #1 start
  • [^']* - zero or more non- '
  • (?:''[^']*)* - 0+ sequences of '' followed with 0+ non- '
  • ) - Capture group #1 end
  • ' - closing '

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM