[英]Regex to find all curly brackets within a quoted string
I have a string: 我有一个字符串:
test_str = 'This is the string and it "contains {0} a" few {1} sets of curly brackets'
I would like to only find {0}
and not {1}
in this example, that is, the brackets themselves and their contents, if only within a set of double quotes. 在此示例中,我只想找到
{0}
而不是 {1}
,也就是说,如果仅在一组双引号内,则括号本身及其内容也将找到。
I've started to solve this by matching the portion in double quotes: 我已经开始通过匹配双引号中的部分来解决此问题:
(?<=").*(?=")
See https://regex101.com/r/qO0pO2/1 参见https://regex101.com/r/qO0pO2/1
but I am having difficulty only matching the {0}
portion 但是我很难仅匹配
{0}
部分
How can I extend this regex to match {0}
? 如何扩展此正则表达式以匹配
{0}
?
You can try word boundary \\B
and lookarounds
- ie 您可以尝试单词边界
\\B
和lookarounds
-即
>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'(?<=\B){.*?}(?=\B)',test_str)
>>>['{0}', '{1}']
But if your string does not have word boundary
then try lazy quantifier evaluation
但是,如果您的字符串没有
word boundary
尝试使用lazy quantifier evaluation
>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'{.*?}',test_str)
>>>['{0}', '{1}']
EDIT 编辑
If you want only {0}
then you have to use escape character( \\
) before braces, since braces are regex token-try as below. 如果只想使用
{0}
则必须在括号之前使用转义字符( \\
),因为括号是如下所示的正则表达式令牌尝试。
>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'\{0\}',test_str)
>>>['{0}']
If the quotes are balanced, you could use a lookahead to check for an uneven amount ahead. 如果报价是平衡的,你可以使用一个前瞻提前检查不均匀量。 If you know, that there is only one quoted substring, check if there occurs only one
"
until the end $
如果知道只有一个带引号的子字符串,请检查是否只有一个
"
直到$
{[^}]+}(?=[^"]*"[^"]*$)
See demo . 参见演示 。 But if there could be any amount of quoted parts check for an uneven amount until end.
但是,如果有任何数量的报价零件,请检查不均匀的数量,直至结束。
{[^}]+}(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)
{[^}]+}
matches the braced stuff: literal {
followed by [^}]+
one or more non }
until }
{[^}]+}
与括号内的内容匹配:文字{
后面跟着[^}]+
一个或多个非 }
直到}
[^"]*"
inside the lookahead matches until the first quote [^"]*"
匹配到第一个引号 (?:[^"]*"[^"]*")*
followed by zero or more balanced, preceded by any amount of non quotes (?:[^"]*"[^"]*")*
后跟零个或多个平衡值,后跟任意数量的非引号 [^"]*$
followed by any amount of non quotes until end [^"]*$
后跟任意数量的非引号,直到结束 Might be difficult to do in one regex, but it's easy with two: 在一个正则表达式中可能很难做到,但是在两个正则表达式中很容易做到:
from re import findall
# First find all quoted strings...
for quoted in findall(r'"[^"]*"', test_str):
# ...then find all bracketed expressions
for match in findall(r'\{[^\}]*\}', quoted):
print(match)
or as a one-liner: 或单线:
[match for match in findall(r'\{[^\}]*\}', quoted) for quoted in findall(r'"[^"]*"', test_str)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.