简体   繁体   English

Python 正则表达式匹配任何括在引号括号大括号或括号中的内容

[英]Python regex match anything enclosed in either quotations brackets braces or parenthesis

UPDATE更新

This is still not entirely the solution so far.到目前为止,这仍然不是完全的解决方案。 It is only for preceding repeated closing characters (eg )) , ]] , }} ).它仅用于前面重复的结束字符(例如)) , ]] , }} )。 I'm still looking for a way to capture enclosed contents and will update this.我仍在寻找一种捕获封闭内容的方法,并将对其进行更新。

Code:代码:

>>> import re
>>> re.search(r'(\(.+?[?<!)]\))', '((x(y)z))', re.DOTALL).groups()
('((x(y)z))',)

Details:细节:

r'(\(.+?[?<!)]\))'
  • () - Capturing group special characters. () - 捕获组特殊字符。
  • \\( and \\) - The open and closing characters (eg ' , " , () , {} , [] ) \\(\\) - 开始和结束字符(例如' , " , () , {} , []
  • .+? - Match any character content (use with re.DOTALL flag) - 匹配任何字符内容(与re.DOTALL标志一起使用)
  • [?<!)] - The negative lookbehind for character ) (replace this with the matching closing character). [?<!)] - 字符)的负向后视(用匹配的结束字符替换它)。 This will basically find any ) character where \\) character does not precede (more info here ).这基本上会找到任何)字符,其中\\)字符不在之前(更多信息在这里)。

I was trying to parse something like a variable assignment statement for this lexer thing I'm working with, just trying to get the basic logic behind interpreters/compilers.我试图为我正在使用的这个词法分析器解析变量赋值语句之类的东西,只是试图获得解释器/编译器背后的基本逻辑。

Here's the basic assignment statements and literals I'm dealing with:这是我正在处理的基本赋值语句和文字:

az = none
az_ = true
az09 = false
az09_ = +0.9
az_09 = 'az09_'
_az09 = "az09_"
_az = [
  "az",
  0.9
]
_09 = {
  0: az
  1: 0.9
}
_ = (
  true
)

Somehow, I managed to parse those simple assignments like none , true , false , and numeric literals.不知何故,我设法解析了像nonetruefalse和数字文字这样的简单赋值。 Here's where I'm currently stuck at:这是我目前陷入困境的地方:

import sys
import re

# validate command-line arguments
if (len(sys.argv) != 2): raise ValueError('usage: parse <script>')

# parse the variable name and its value
def handle_assignment(index, source):
    # TODO: handle quotations, brackets, braces, and parenthesis values
    variable = re.search(r'[\S\D]([\w]+)\s+?=\s+?(none|true|false|[-+]?\d+\.?\d+|[\'\"].*[\'\"])', source[index:])
    if variable is not None:
        print('{}={}'.format(variable.group(1), variable.group(2)))
        index += source[index:].index(variable.group(2))
    return index

# parse through the source element by element
with open(sys.argv[1]) as file:
    source = file.read()
    index = 0
    while index < len(source):
        # checks if the line matches a variable assignment statement
        if re.match(r'[\S\D][\w]+\s+?=', source[index:]):
            index = handle_assignment(index, source)
        index += 1

I was looking for a way to capture those values with enclosed quotations, brackets, braces, and parenthesis.我一直在寻找一种方法来用括起来的引号、方括号、大括号和圆括号来捕获这些值。

Probably, will update this post if I found an answer.如果我找到答案,可能会更新这篇文章。

Use a regexp with multiple alternatives for each matching pair.对每个匹配对使用带有多个替代项的正则表达式。

re.match(r'\'.*?\'|".*?"|\(.*?\)|\[.*?\]|\{.*?\}', s)

Note, however, that if there are nested brackets, this will match the first ending bracket, eg if the input is但是请注意,如果有嵌套括号,这将匹配第一个结束括号,例如,如果输入是

(words (and some more words))

the result will be结果将是

(words (and some more words)

Regular expressions are not appropriate for matching nested structures, you should use a more powerful parsing technique.正则表达式不适用于匹配嵌套结构,您应该使用更强大的解析技术。

Solution for @Barmar's recursive characters using the regex third-party module:使用regex第三方模块解决@Barmar的递归字符问题:

pip install regex
python3
>>> import regex
>>> recurParentheses = regex.compile(r'[(](?:[^()]|(?R))*[)]')
>>> recurParentheses.findall('(z(x(y)z)x) ((x)(y)(z))')
['(z(x(y)z)x)', '((x)(y)(z))']
>>> recurCurlyBraces = regex.compile(r'[{](?:[^{}]|(?R))*[}]')
>>> recurCurlyBraces.findall('{z{x{y}z}x} {{x}{y}{z}}')
['{z{x{y}z}x}', '{{x}{y}{z}}']
>>> recurSquareBrackets = regex.compile(r'[[](?:[^][]|(?R))*[]]')
>>> recurSquareBrackets.findall('[z[x[y]z]x] [[x][y][z]]')
['[z[x[y]z]x]', '[[x][y][z]]']

For string literal recursion, I suggest take a look at this .对于字符串文字递归,我建议看看这个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM