简体   繁体   English

正则表达式匹配所有封闭''(2单引号)

[英]Regular Expression to Match All Enclosed '' (2 Single Quotes)

I am looking for a regex that will provide me with capture groups for each set of 2 single quotes ( '' ) within the single-quoted strings ( 'string' ) that are part of a comma-separated list. 我正在寻找一个正则表达式 ,它将为单引号字符串( 'string' )中的每组2个单引号( '' )提供捕获组 ,这些'string'是逗号分隔列表的一部分。 For instance the string 'tom''s' would have a single group between the m and the s . 例如,字符串'tom''s'ms之间将有一个组。 I've come close but keep getting tripped up by either erroneously matching up with the enclosing single quotes or with only capturing some of the 2 single quotes within a string. 我已经接近了,但是由于错误地与封闭的单引号匹配或者仅捕获字符串中的一些2个单引号而继续被绊倒。

Example Input 示例输入

'11','22'',','''33','44''','''55''','6''''6'

Desired Groups (7, shown in parens) 期望的团体(7,显示在parens)

 '11','22(''),','('')33','44('')','('')55('')','6('')('')6'

For context, what I'm ultimately attempting to do is replace these 2 single quotes within the comma-separated sequence of strings with another value in order to make subsequent parsing easier. 对于上下文,我最终尝试做的是将逗号分隔的字符串序列中的这两个单引号替换为另一个值,以便使后续解析更容易。

Note also that commas may be contained within the single quoted strings. 另请注意,逗号可以包含在单引号字符串中。

You cannot match the double single quotes like that with Python re module. 你不能像Python re模块那样匹配双单引号。 You can just match the single-quoted entries and capture the inner part of each entry, and using a lambda, replace the '' inside with a mere .replace : 你可以只匹配单引号条目和捕捉每个条目的内部,并用拉姆达,更换''与单纯的内.replace

import re
p = re.compile(r"'([^']*(?:''[^']*)*)'")
test_str = "'11','22'',','''33','44''','''55''','6''''6'"
print(p.sub(lambda m: "'{}'".format(m.group(1).replace("''", "&")), test_str))

See IDEONE demo , output: '11','22&,','&33','44&','&55&','6&&6' 参见IDEONE演示 ,输出: '11 '11','22&,','&33','44&','&55&','6&&6'

The regex is '([^']*(?:''[^']*)*)' : 正则表达式是'([^']*(?:''[^']*)*)'

  • ' - opening ' ' - 开放'
  • ( - Capture group #1 start ( - 捕获组#1开始
  • [^']* - zero or more non- ' [^']* - 零或更多非'
  • (?:''[^']*)* - 0+ sequences of '' followed with 0+ non- ' (?:''[^']*)* - 0+序列''后跟0+非'
  • ) - Capture group #1 end ) - 捕获组#1结束
  • ' - closing ' ' - 关闭'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM