简体   繁体   English

如何获取捕获组的所有匹配迭代

[英]How to get all matching iterations for a capture group

I made this regular expression and use it with re.findall(): 我制作了这个正则表达式并将其与re.findall()一起使用:

SELECT.*{(?:\[([a-zA-Z0-9 ]*)\]\.\[([a-zA-Z0-9 ]*)\]\.\[([a-zA-Z0-9 ]*)\][,]{0,1}){1,}}.*

使用https://jex.im制作

to match these lists of strings: 匹配这些字符串列表:

["dimSales","Product Title","All"], ["test","Product Title","All"] [“dimSales”,“产品名称”,“全部”],[“测试”,“产品名称”,“全部”]

in this haystack: 在这个大海捞针:

SELECT NON EMPTY Hierarchize({DrilldownLevel({[dimSales].[Product Title].[All],[test].[Product Title].[All]},,,INCLUDE_CALC_MEMBERS)}) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS FROM [Model] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

my regex only matches the last iteration of the outer capturing group 我的正则表达式只匹配外部捕获组的最后一次迭代

["test","Product Title","All"] [“test”,“Product Title”,“All”]

what do I need to change, so re.findall() returns all iterations. 我需要更改什么,所以re.findall()返回所有迭代。 Not just the last iteration of the outer capturing group? 不只是外部捕获组的最后一次迭代?

What about this regex: 那个正则表达式怎么样:

(\[\"[^\"]*\",\"[^\"]*\",\"[^\"]*\"\],\s*\[\"[^\"]*\",\"[^\"]*\",\"[^\"]*\"\])

demo: 演示:

https://regex101.com/r/LaddaK/2/ https://regex101.com/r/LaddaK/2/

Explanations: 说明:

  • parenthesis () to have your capturing group, can be removed if not necessary 如果没有必要,可以删除括号()以获取捕获组
  • \\[\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\] to match an open bracket literally followed by a double quote, 0 to N non double quote characters ( [^\\"]* ) followed by a double quote and a comma. \\[\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]匹配一个开放式括号,后面跟一个双引号,0到N个非双引号字符( [^\\"]* )后跟双引号和逗号。 You might have to surround all commas by \\s* if you have want to accept space characters around them. 如果你想要接受它们周围的空格字符,你可能必须用\\s*包围所有逗号。
  • you repeat another 2 times the pattern \\"[^\\"]*\\" to match the first 3 words surround in brackets (you might have to adapt into \\w* depending on your exact constraints on the strings. 你重复另外两次模式\\"[^\\"]*\\"来匹配括号中的前三个单词(你可能需要适应\\w*具体取决于你对字符串的确切约束。
  • you repeat the whole block [\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\] after a ,\\s* to accept the whole pattern made of 2 blocks of brackets. 你重复整个块[\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]之后,\\s*接受整个图案由2个方括号组成。

Notes: 笔记:

  • You might want to surround your regex with anchors ( ^ and $ ) 您可能希望用锚点( ^$ )包围正则表达式

  • I don't know exactly your constraints but if you want to analyse some JSON or parse any other format with infinite nested patterns repeating themselves (ex: fractals) you should not use regex. 我不完全知道你的约束,但如果你想分析一些JSON或解析任何其他格式的无限嵌套模式重复自己(例如:分形)你不应该使用正则表达式。

EDIT after change of requirements: 更改要求后编辑:

import re

inputStr = '[dimSales,Product Title,All], [test,Product Title,All]'
print(re.findall(r'\[(?:[a-zA-Z0-9 ]*)(?:,[a-zA-Z0-9 ]*)*\]', inputStr))

output: 输出:

['[dimSales,Product Title,All]', '[test,Product Title,All]']
string = "SELECT NON EMPTY Hierarchize({DrilldownLevel({[dimSales].[Product Title].[All],[test].[Product Title].[All]},,,INCLUDE_CALC_MEMBERS)}) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS FROM [Model] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS"

print re.findall(r"(?:SELECT .+\({|,)\[([\w ]+)\]\.\[([\w ]+)\]\.\[([\w ]+)\](?=[^}]*})",  string)

Output: 输出:

[('dimSales', 'Product Title', 'All'), ('test', 'Product Title', 'All')]

Explanation: 说明:

(?:SELECT .+\({|,)      # non capture group, match SELECT folowed by 1 or more any character then ({ OR a comma
\[([\w ]+)\]            # group 1, 1 or more word character or space inside square brackets
\.                      # a dot
\[([\w ]+)\]            # group 2, 1 or more word character or space inside square brackets
\.                      # a dot
\[([\w ]+)\]            # group 3, 1 or more word character or space inside square brackets
(?=[^}]*})              # positive lookahead, make sure we have after a close curly bracket not preceeded by another curly bracket

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM