简体   繁体   English

Python regex:在引号之间跳过分隔符的问题

[英]Python regex : issues in skipping delimiter between quotes

I am new to regex and trying to split on the basis of (and/or) as delimiters我是 regex 的新手,并试图根据(和/或)作为分隔符进行拆分

I used the solution provided in : https://stackoverflow.com/a/18893443/5164936我使用了以下提供的解决方案: https : //stackoverflow.com/a/18893443/5164936

and modified my regex as :并将我的正则表达式修改为:

re.split(r'(\s+and\s+|\s+or\s+)(?=(?:[^"]*"[^"]*")*[^"]*$)', s)

which works like a charm for majority of my use cases except for following input:除了以下输入外,对于我的大多数用例来说,它就像一个魅力:

'col1 == "val1" or col2 == \'val1 and " val2\''

the split fails for this particular case and I have tried modifying the above regex with different combination with no luck.对于这种特殊情况,拆分失败,我尝试使用不同的组合修改上述正则表达式,但没有运气。 Can someone please help fix this regex.有人可以帮助修复这个正则表达式。

You may use a PyPi regex based solution:您可以使用基于PyPi 正则表达式的解决方案:

import regex

s = 'col1 == "val1" or col2 == \'val1 and " val2\''
res = regex.split(r'''(?V1)(?:"[^"]*"|'[^']*')\K|(\s+(?:and|or)\s+)''', s)
print([x for x in res if x])
# => ['col1 == "val1"', ' or ', 'col2 == \'val1 and " val2\'']

See the Python demo online .在线查看Python 演示

Details细节

  • (?V1) - flag that allows splitting at zero length matches (?V1) - 允许在零长度匹配处拆分的标志
  • (?:"[^"]*"|'[^']*')\\K - a substring in between double or single quotation marks that is discarded from the match value using the \\K match reset operator (thus, when this pattern matches, the match is an empty string) (?:"[^"]*"|'[^']*')\\K - 使用\\K匹配重置运算符从匹配值中丢弃的双引号或单引号之间的子字符串(因此,当这模式匹配,匹配项为空字符串)
  • | - or - 或者
  • (\\s+(?:and|or)\\s+) - 1+ whitespaces, and or or and again 1+ whitespaces. (\\s+(?:and|or)\\s+) - 1+ 个空格, and or or 1+ 个空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM