简体   繁体   English

正则表达式查找带引号的字符串中的所有大括号

[英]Regex to find all curly brackets within a quoted string

I have a string: 我有一个字符串:

test_str = 'This is the string and it "contains {0} a" few {1} sets of curly brackets'

I would like to only find {0} and not {1} in this example, that is, the brackets themselves and their contents, if only within a set of double quotes. 在此示例中,我只想找到{0}不是 {1} ,也就是说,如果仅在一组双引号内,则括号本身及其内容也将找到。

I've started to solve this by matching the portion in double quotes: 我已经开始通过匹配双引号中的部分来解决此问题:

(?<=").*(?=")

See https://regex101.com/r/qO0pO2/1 参见https://regex101.com/r/qO0pO2/1

but I am having difficulty only matching the {0} portion 但是我很难仅匹配{0}部分

How can I extend this regex to match {0} ? 如何扩展此正则表达式以匹配{0}

Remove the pipe | 疏通管道| it will work great: Live Demo 效果很好: 现场演示

And here is for multiple char between {} 这是{}之间的多个字符

(?<=)\{[^\}]*\}(?=)

With Live Demo 现场演示


Update: 更新:

This does the stuff : 是东西:

".*({[^\}]*\}).*"

You can try word boundary \\B and lookarounds - ie 您可以尝试单词边界\\Blookarounds -即

>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'(?<=\B){.*?}(?=\B)',test_str)
>>>['{0}', '{1}']

See live DEMO 观看现场演示

But if your string does not have word boundary then try lazy quantifier evaluation 但是,如果您的字符串没有word boundary尝试使用lazy quantifier evaluation

>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'{.*?}',test_str)
>>>['{0}', '{1}']

See live DEMO 观看现场演示


EDIT 编辑

If you want only {0} then you have to use escape character( \\ ) before braces, since braces are regex token-try as below. 如果只想使用{0}则必须在括号之前使用转义字符( \\ ),因为括号是如下所示的正则表达式令牌尝试。

>>>test_str="This is the string and it contains {0} a few {1} sets of curly brackets"
>>>re.findall(r'\{0\}',test_str)
>>>['{0}']

If the quotes are balanced, you could use a lookahead to check for an uneven amount ahead. 如果报价是平衡的,你可以使用一个前瞻提前检查不均匀量。 If you know, that there is only one quoted substring, check if there occurs only one " until the end $ 如果知道只有一个带引号的子字符串,请检查是否只有一个"直到$

{[^}]+}(?=[^"]*"[^"]*$)

See demo . 参见演示 But if there could be any amount of quoted parts check for an uneven amount until end. 但是,如果有任何数量的报价零件,请检查不均匀的数量,直至结束。

{[^}]+}(?=[^"]*"(?:[^"]*"[^"]*")*[^"]*$)
  • {[^}]+} matches the braced stuff: literal { followed by [^}]+ one or more non } until } {[^}]+}与括号内的内容匹配:文字{后面跟着[^}]+一个或多个 }直到}
  • [^"]*" inside the lookahead matches until the first quote 前瞻中的[^"]*"匹配到第一个引号
  • (?:[^"]*"[^"]*")* followed by zero or more balanced, preceded by any amount of non quotes (?:[^"]*"[^"]*")*后跟零个或多个平衡值,后跟任意数量的非引号
  • [^"]*$ followed by any amount of non quotes until end [^"]*$后跟任意数量的非引号,直到结束

See demo at regex101 在regex101上查看演示

Might be difficult to do in one regex, but it's easy with two: 在一个正则表达式中可能很难做到,但是在两个正则表达式中很容易做到:

from re import findall

# First find all quoted strings...
for quoted in findall(r'"[^"]*"', test_str):
    # ...then find all bracketed expressions
    for match in findall(r'\{[^\}]*\}', quoted):
        print(match)

or as a one-liner: 或单线:

[match for match in findall(r'\{[^\}]*\}', quoted) for quoted in findall(r'"[^"]*"', test_str)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM