简体   繁体   English

匹配所有完整报价与正则表达式

[英]Matching all Full Quotes with Regex

so matching quotes when you don't know if it will be single or double is fairly easy: 所以当你不知道单引号还是双引号时匹配引号是相当容易的:

>>> s ="""this is a "test" that I am "testing" today"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"testing"']

that will search a string for either single or double quotes and get what is inbetween. 这将搜索字符串中的单引号或双引号,并获取其中的内容。 But here is the issue: 但问题出在这里:

It will close strings if they contain the other type of quote! 如果它们包含其他类型的引用,它将关闭字符串! Here are two examples to illustrate what I mean: 这里有两个例子来说明我的意思:

>>> s ="""this is a "test" and this "won't work right" at all"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"won\'']
>>> s ="""something is "test" and this is "an 'inner' string" too"""
>>> re.findall('[\'"].*?[\'"]',s)
['"test"', '"an \'', '\' string"']

the regular expression '[\\'"].*?[\\'"]' will match a single quote with a double quote, which is clearly bad. 正则表达式'[\\'"].*?[\\'"]'将匹配带双引号的单引号,这显然很糟糕。

So what regular expression will match both types of quotes, but only match the actual string if it ends with the same kind of quote. 那么正则表达式将匹配两种类型的引号,但只有匹配实际字符串,如果它以相同类型的引用结束。

in case you're confused 如果你感到困惑

Here are my desired outputs: 这是我想要的输出:

s ="""this is a "test" and this "won't work right" at all"""
re.findall(expression,s)
#prints ['"test"','"won\'t work right"']

s ="""something is "test" and this is "an 'inner' string" too"""
re.findall(expression,s)
['"test"', '"an \'inner\' string"',"'inner'"]

Wrap your first character class in a capturing group and then refer to it on the other side with \\1 : 将您的第一个字符类包装在捕获组中,然后使用\\1在另一侧引用它:

>>> re.findall(r'([\'"])(.*?)\1',s)
[('"', 'test'), ('"', "won't work right")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM