[英]How to get all matching iterations for a capture group
I made this regular expression and use it with re.findall(): 我制作了这个正则表达式并将其与re.findall()一起使用:
SELECT.*{(?:\[([a-zA-Z0-9 ]*)\]\.\[([a-zA-Z0-9 ]*)\]\.\[([a-zA-Z0-9 ]*)\][,]{0,1}){1,}}.*
to match these lists of strings: 匹配这些字符串列表:
["dimSales","Product Title","All"], ["test","Product Title","All"]
[“dimSales”,“产品名称”,“全部”],[“测试”,“产品名称”,“全部”]
in this haystack: 在这个大海捞针:
SELECT NON EMPTY Hierarchize({DrilldownLevel({[dimSales].[Product Title].[All],[test].[Product Title].[All]},,,INCLUDE_CALC_MEMBERS)}) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS FROM [Model] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS
my regex only matches the last iteration of the outer capturing group 我的正则表达式只匹配外部捕获组的最后一次迭代
["test","Product Title","All"]
[“test”,“Product Title”,“All”]
what do I need to change, so re.findall() returns all iterations. 我需要更改什么,所以re.findall()返回所有迭代。 Not just the last iteration of the outer capturing group?
不只是外部捕获组的最后一次迭代?
What about this regex: 那个正则表达式怎么样:
(\[\"[^\"]*\",\"[^\"]*\",\"[^\"]*\"\],\s*\[\"[^\"]*\",\"[^\"]*\",\"[^\"]*\"\])
demo: 演示:
https://regex101.com/r/LaddaK/2/ https://regex101.com/r/LaddaK/2/
Explanations: 说明:
()
to have your capturing group, can be removed if not necessary ()
以获取捕获组 \\[\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]
to match an open bracket literally followed by a double quote, 0 to N non double quote characters ( [^\\"]*
) followed by a double quote and a comma. \\[\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]
匹配一个开放式括号,后面跟一个双引号,0到N个非双引号字符( [^\\"]*
)后跟双引号和逗号。 You might have to surround all commas by \\s*
if you have want to accept space characters around them. \\s*
包围所有逗号。 \\"[^\\"]*\\"
to match the first 3 words surround in brackets (you might have to adapt into \\w*
depending on your exact constraints on the strings. \\"[^\\"]*\\"
来匹配括号中的前三个单词(你可能需要适应\\w*
具体取决于你对字符串的确切约束。 [\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]
after a ,\\s*
to accept the whole pattern made of 2 blocks of brackets. [\\"[^\\"]*\\",\\"[^\\"]*\\",\\"[^\\"]*\\"\\]
之后,\\s*
接受整个图案由2个方括号组成。 Notes: 笔记:
You might want to surround your regex with anchors ( ^
and $
) 您可能希望用锚点(
^
和$
)包围正则表达式
I don't know exactly your constraints but if you want to analyse some JSON or parse any other format with infinite nested patterns repeating themselves (ex: fractals) you should not use regex.
我不完全知道你的约束,但如果你想分析一些JSON或解析任何其他格式的无限嵌套模式重复自己(例如:分形)你不应该使用正则表达式。
EDIT after change of requirements: 更改要求后编辑:
import re
inputStr = '[dimSales,Product Title,All], [test,Product Title,All]'
print(re.findall(r'\[(?:[a-zA-Z0-9 ]*)(?:,[a-zA-Z0-9 ]*)*\]', inputStr))
output: 输出:
['[dimSales,Product Title,All]', '[test,Product Title,All]']
string = "SELECT NON EMPTY Hierarchize({DrilldownLevel({[dimSales].[Product Title].[All],[test].[Product Title].[All]},,,INCLUDE_CALC_MEMBERS)}) DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS FROM [Model] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS"
print re.findall(r"(?:SELECT .+\({|,)\[([\w ]+)\]\.\[([\w ]+)\]\.\[([\w ]+)\](?=[^}]*})", string)
Output: 输出:
[('dimSales', 'Product Title', 'All'), ('test', 'Product Title', 'All')]
Explanation: 说明:
(?:SELECT .+\({|,) # non capture group, match SELECT folowed by 1 or more any character then ({ OR a comma
\[([\w ]+)\] # group 1, 1 or more word character or space inside square brackets
\. # a dot
\[([\w ]+)\] # group 2, 1 or more word character or space inside square brackets
\. # a dot
\[([\w ]+)\] # group 3, 1 or more word character or space inside square brackets
(?=[^}]*}) # positive lookahead, make sure we have after a close curly bracket not preceeded by another curly bracket
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.