繁体   English   中英

Python regex可选数字匹配返回的结果超出预期

[英]Python regex optional number match returns more than expected

我有一个文件列表,我试图过滤以000000、060000、120000、180000结尾的文件名的子集。我知道我可以进行直接的字符串匹配,但是我想了解为什么正则表达式我尝试在r'[00 | 06 | 12 | 18] +0000'以下运行,但不起作用(它也返回MSM_20130519210000.csv)。 我希望将其匹配为00、06、12、18中的任何一个,然后匹配0000。如何实现? 谢谢,请保持答案符合预期的正则表达式而不是其他功能。

这是代码片段:

import re

files_in_input_directory = ['MSM_20130519150000.csv', 'MSM_20130519180000.csv', 'MSM_20130519210000.csv', 
'MSM_20130520000000.csv', 'MSM_20130520030000.csv', 'MSM_20130520060000.csv', 'MSM_20130520090000.csv', 
'MSM_20130520120000.csv', 'MSM_20130520150000.csv', 'MSM_20130520180000.csv', 'MSM_20130520210000.csv', 
'MSM_20130521000000.csv', 'MSM_20130521030000.csv', 'MSM_20130521060000.csv', 'MSM_20130521090000.csv', 
'MSM_20130521120000.csv', 'MSM_20130521150000.csv', 'MSM_20130521180000.csv', 'MSM_20130521210000.csv', 
'MSM_20130522000000.csv', 'MSM_20130522030000.csv', 'MSM_20130522060000.csv', 'MSM_20130522090000.csv', 
'MSM_20130522120000.csv', 'MSM_20130522150000.csv', 'MSM_20130522180000.csv', 'MSM_20130522210000.csv', 
'MSM_20130523000000.csv', 'MSM_20130523030000.csv', 'MSM_20130523060000.csv', 'MSM_20130523090000.csv', 
'MSM_20130523120000.csv', 'MSM_20130523150000.csv', 'MSM_20130523180000.csv', 'MSM_20130523210000.csv', 
'MSM_20130524000000.csv', 'MSM_20130524030000.csv', 'MSM_20130524060000.csv', 'MSM_20130524090000.csv', 
'MSM_20130524120000.csv', 'MSM_20130524150000.csv', 'MSM_20130524180000.csv', 'MSM_20130524210000.csv', 
'MSM_20130525000000.csv', 'MSM_20130525030000.csv', 'MSM_20130525060000.csv', 'MSM_20130525090000.csv', 
'MSM_20130525120000.csv', 'MSM_20130525150000.csv', 'MSM_20130525180000.csv', 'MSM_20130525210000.csv', 
'MSM_20130526000000.csv', 'MSM_20130526030000.csv', 'MSM_20130526060000.csv', 'MSM_20130526090000.csv', 
'MSM_20130526120000.csv', 'MSM_20130526150000.csv', 'MSM_20130526180000.csv', 'MSM_20130526210000.csv', 
'MSM_20130527000000.csv', 'MSM_20130527030000.csv', 'MSM_20130527060000.csv', 'MSM_20130527090000.csv', 
'MSM_20130527120000.csv', 'MSM_20130527150000.csv', 'MSM_20130527180000.csv', 'MSM_20130527210000.csv', 
'MSM_20130528000000.csv', 'MSM_20130528030000.csv', 'MSM_20130528060000.csv', 'MSM_20130528090000.csv', 
'MSM_20130528120000.csv', 'MSM_20130528150000.csv', 'MSM_20130528180000.csv', 'MSM_20130528210000.csv', 
'MSM_20130529000000.csv', 'MSM_20130529030000.csv', 'MSM_20130529060000.csv', 'MSM_20130529090000.csv']

print files_in_input_directory
print "\n"

# trying to match any string with 000000, 060000, 120000, 180000
# Question: I use + meaning one or more, and | to indicates the options, but this will match
# 'MSM_20130519210000.csv' as well, and I don't know why
print filter(lambda x:re.search(r'[00|06|12|18]+0000', x), files_in_input_directory)
print "\n"

# This verbose version works
print filter(lambda x:re.search(r'0000000|060000|120000|180000', x), files_in_input_directory)
print "\n"

如果你想匹配包含文件名000000060000120000180000 ,然后代替

re.search(r'[00|06|12|18]+0000', x)

采用

re.search(r'(00|06|12|18)0000', x)

[...]方括号一次只匹配一个字符,而+字符则表示“匹配前面的表达式中的1个或多个 ”。

[00|06|12|18]是与00|06|12|18匹配的字符集 因此,它将匹配“ SM_20130519210000.csv”中的210000 ,因为[00|06|12|18]等效于写入[01268]。 我想的不是你的意思。

与其表示可以匹配一次或多次的字符集,不如使其成为捕获组

r'(00|06|12|18)0000'

或负向后看表达式

r'(?<=00|06|12|18)0000'

对于您的目的,它们是等效的,因为您不关心比赛或任何组。

这里的基本问题是,您不是在对模式进行分组,而是在不使用``[...]`''的情况下创建字符集。

此正则表达式起作用: ((000)|(06)|(12)|(18))0000

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM