繁体   English   中英

找到字符串的哪个部分与正则表达式python不匹配

[英]Find what part of a string do not match with regular expression python

为了查看文件名是否正确命名(使用re),我使用以下正则表达式模式:

*^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"*

这是一个正确的文件名: *S_hc_1.2.3_2014-213T123121.xml*

这是一个不正确的文件名: *S_hc_1.2.IncorrectName_2014-213T123121.xml*

我想知道是否有一种简单的方法来检索文件中与出口不匹配的部分。

最后,将显示一条错误消息:

Error, incorrect file name, the part 'IncorrectName' does not match with expected name. 

您可以使用re.split和发电机表达式中next ,但你还需要检查你的字符串匹配你想要啥子,你可以用以下re.match做到这一点的结构:

re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)

在代码中:

>>> import re
>>> s2 ='S_hc_1.2.IncorrectName_2014-213T123121.xml'
>>> s1
'S_hc_1.2.3_2014-213T123121.xml'
#with s1
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s1) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
#with s2
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
'IncorrectName'

您只需要在正则表达式模式的唯一部分之间使用pip( | ),然后split函数将根据其中一种模式拆分字符串。

并且与您的某个模式不匹配的部分将不会被拆分,您可以通过循环覆盖您的拆分文本找到它!

next(迭代器[,默认])

通过调用next()方法从迭代器中检索下一个项目。 如果给定default,则在迭代器耗尽时返回,否则引发StopIteration。

如果你想要几行:

>>> for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2):
...   if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2):
...        print i
... 
IncorrectName

也许这是一个更长的解决方案,但它会告诉你失败的原因和预期。 它类似于Kasra的解决方案 - 将文件名分成单个位并依次匹配它们。 这允许您找出匹配中断的位置:

import re
# break up the big file name pattern into individual bits that we can match
RX      = re.compile
pattern = [
        RX(r"\*"),
        RX(r"S_hc_"),
        RX(r"[0-9]{1,2}"),
        RX(r"\."),
        RX(r"[0-9]{1,2}"),
        RX(r"\."),
        RX(r"[0-9]{1,2}"),
        RX(r"_"),
        RX(r"[0-9]{4}"),
        RX(r"-"),
        RX(r"[0-9]{1,3}"),
        RX(r"T"),
        RX(r"[0-9]{6}"),
        RX(r"\.xml"),
        RX(r"\*")
        ]

# 'fn' is the file name matched so far
def reductor(fn, rx):
    if fn is None:
        return None
    mo = rx.match(fn)
    if mo is None:
        print "File name mismatch: got {}, expected {}".format(fn, rx.pattern)
        return None
    # proceed with the remainder of the string
    return fn[mo.end():]


validFile = lambda fn: reduce(reductor, pattern, fn) is not None

我们来测试一下:

print validFile("*S_hc_1.2.3_2014-213T123121.xml*")
print validFile("*S_hc_1.2.IncorrectName_2014-213T123121.xml*")

输出:

True
File name mismatch: got IncorrectName_2014-213T123121.xml*, expected [0-9]{1,2}
False

这是我要使用的方法,如果案例不匹配,请告诉我:

def verifyFileName(self, filename__, pattern__):
    '''
    Verifies if a file name is correct
    :param filename__: file name 
    :param pattern__: pattern
    :return: empty string if file name is correct, otherwise the incorrect part of file
    '''
    incorrectPart =""
    pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
    for i in re.split(pattern, filename__):
        if len(i)>1:
            incorrectPart = i
    return incorrectPart

这是反例。 我已经采用了您的方法并定义了三个测试用例 - 文件名和预期输出。

这是输出,代码如下:

$> python m.py
S_hc_1.2.3_2014-213T123121.xml: PASS [expect None got None]
S_hc_1.2.3_Incorrect-213T123121.xml: PASS [expect Incorrect- got Incorrect-]
X_hc_1.2.3_2014-213T123121.xml: FAIL [expect X got None]

这是代码 - 剪切和粘贴并运行它。

def verifyFileName(filename__, pattern__):
    '''
    Verifies if a file name is correct
    :param filename__: file name 
    :param pattern__: pattern
    :return: empty string if file name is correct, otherwise the incorrect part of file
    '''
    incorrectPart = None
    pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
    for i in re.split(pattern, filename__):
        if len(i)>1:
            incorrectPart = i
    return incorrectPart


pattern = "^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"

# list of test cases: filenames + expected return from verifyFileName:
testcases = [
        # correct file name
        ("S_hc_1.2.3_2014-213T123121.xml", None),
        # obviously incorrect
        ("S_hc_1.2.3_Incorrect-213T123121.xml", "Incorrect-"),
        # subtly incorrect but still incorrect
        ("X_hc_1.2.3_2014-213T123121.xml", "X")
        ]

for (fn, expect) in testcases:
    res = verifyFileName(fn, pat)
    print "{}: {} [expect {} got {}]".format(fn, "PASS" if res==expect else "FAIL", expect, str(res))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM