[英]Find what part of a string do not match with regular expression python
为了查看文件名是否正确命名(使用re),我使用以下正则表达式模式:
*^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"*
这是一个正确的文件名: *S_hc_1.2.3_2014-213T123121.xml*
这是一个不正确的文件名: *S_hc_1.2.IncorrectName_2014-213T123121.xml*
我想知道是否有一种简单的方法来检索文件中与出口不匹配的部分。
最后,将显示一条错误消息:
Error, incorrect file name, the part 'IncorrectName' does not match with expected name.
您可以使用re.split
和发电机表达式中next
,但你还需要检查你的字符串匹配你想要啥子,你可以用以下re.match做到这一点的结构:
re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)
在代码中:
>>> import re
>>> s2 ='S_hc_1.2.IncorrectName_2014-213T123121.xml'
>>> s1
'S_hc_1.2.3_2014-213T123121.xml'
#with s1
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s1) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
#with s2
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
'IncorrectName'
您只需要在正则表达式模式的唯一部分之间使用pip( |
),然后split
函数将根据其中一种模式拆分字符串。
并且与您的某个模式不匹配的部分将不会被拆分,您可以通过循环覆盖您的拆分文本找到它!
next(迭代器[,默认])
通过调用next()方法从迭代器中检索下一个项目。 如果给定default,则在迭代器耗尽时返回,否则引发StopIteration。
如果你想要几行:
>>> for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2):
... if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2):
... print i
...
IncorrectName
也许这是一个更长的解决方案,但它会告诉你失败的原因和预期。 它类似于Kasra的解决方案 - 将文件名分成单个位并依次匹配它们。 这允许您找出匹配中断的位置:
import re
# break up the big file name pattern into individual bits that we can match
RX = re.compile
pattern = [
RX(r"\*"),
RX(r"S_hc_"),
RX(r"[0-9]{1,2}"),
RX(r"\."),
RX(r"[0-9]{1,2}"),
RX(r"\."),
RX(r"[0-9]{1,2}"),
RX(r"_"),
RX(r"[0-9]{4}"),
RX(r"-"),
RX(r"[0-9]{1,3}"),
RX(r"T"),
RX(r"[0-9]{6}"),
RX(r"\.xml"),
RX(r"\*")
]
# 'fn' is the file name matched so far
def reductor(fn, rx):
if fn is None:
return None
mo = rx.match(fn)
if mo is None:
print "File name mismatch: got {}, expected {}".format(fn, rx.pattern)
return None
# proceed with the remainder of the string
return fn[mo.end():]
validFile = lambda fn: reduce(reductor, pattern, fn) is not None
我们来测试一下:
print validFile("*S_hc_1.2.3_2014-213T123121.xml*")
print validFile("*S_hc_1.2.IncorrectName_2014-213T123121.xml*")
输出:
True
File name mismatch: got IncorrectName_2014-213T123121.xml*, expected [0-9]{1,2}
False
这是我要使用的方法,如果案例不匹配,请告诉我:
def verifyFileName(self, filename__, pattern__):
'''
Verifies if a file name is correct
:param filename__: file name
:param pattern__: pattern
:return: empty string if file name is correct, otherwise the incorrect part of file
'''
incorrectPart =""
pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
for i in re.split(pattern, filename__):
if len(i)>1:
incorrectPart = i
return incorrectPart
这是反例。 我已经采用了您的方法并定义了三个测试用例 - 文件名和预期输出。
这是输出,代码如下:
$> python m.py
S_hc_1.2.3_2014-213T123121.xml: PASS [expect None got None]
S_hc_1.2.3_Incorrect-213T123121.xml: PASS [expect Incorrect- got Incorrect-]
X_hc_1.2.3_2014-213T123121.xml: FAIL [expect X got None]
这是代码 - 剪切和粘贴并运行它。
def verifyFileName(filename__, pattern__):
'''
Verifies if a file name is correct
:param filename__: file name
:param pattern__: pattern
:return: empty string if file name is correct, otherwise the incorrect part of file
'''
incorrectPart = None
pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
for i in re.split(pattern, filename__):
if len(i)>1:
incorrectPart = i
return incorrectPart
pattern = "^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"
# list of test cases: filenames + expected return from verifyFileName:
testcases = [
# correct file name
("S_hc_1.2.3_2014-213T123121.xml", None),
# obviously incorrect
("S_hc_1.2.3_Incorrect-213T123121.xml", "Incorrect-"),
# subtly incorrect but still incorrect
("X_hc_1.2.3_2014-213T123121.xml", "X")
]
for (fn, expect) in testcases:
res = verifyFileName(fn, pat)
print "{}: {} [expect {} got {}]".format(fn, "PASS" if res==expect else "FAIL", expect, str(res))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.