简体   繁体   English

找到字符串的哪个部分与正则表达式python不匹配

[英]Find what part of a string do not match with regular expression python

In order to see if a filename is correctly named (using re) I use the following regular expression pattern : 为了查看文件名是否正确命名(使用re),我使用以下正则表达式模式:

*^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"*

Here is a correct file name : *S_hc_1.2.3_2014-213T123121.xml* 这是一个正确的文件名: *S_hc_1.2.3_2014-213T123121.xml*

Here is an incorrect file name : *S_hc_1.2.IncorrectName_2014-213T123121.xml* 这是一个不正确的文件名: *S_hc_1.2.IncorrectName_2014-213T123121.xml*

I would like to know if a simple way to retrieve the part of the file which to do not match exits. 我想知道是否有一种简单的方法来检索文件中与出口不匹配的部分。

In the end, an error message would display : 最后,将显示一条错误消息:

Error, incorrect file name, the part 'IncorrectName' does not match with expected name. 

You can use re.split and a generator expression within next but you also need to check the structure of your string that match waht you want, you can do it with following re.match : 您可以使用re.split和发电机表达式中next ,但你还需要检查你的字符串匹配你想要啥子,你可以用以下re.match做到这一点的结构:

re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)

And in code: 在代码中:

>>> import re
>>> s2 ='S_hc_1.2.IncorrectName_2014-213T123121.xml'
>>> s1
'S_hc_1.2.3_2014-213T123121.xml'
#with s1
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s1) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
#with s2
>>> next((i for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2) if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2)),None)
'IncorrectName'

All you need is to use pip ( | ) between unique part of your regex patterns,then the split function will split your string based on one of that patterns. 您只需要在正则表达式模式的唯一部分之间使用pip( | ),然后split函数将根据其中一种模式拆分字符串。

And the part that doesn't match with one of your pattern will not be split and you can find it with looping over your split text! 并且与您的某个模式不匹配的部分将不会被拆分,您可以通过循环覆盖您的拆分文本找到它!

next(iterator[, default]) next(迭代器[,默认])

Retrieve the next item from the iterator by calling its next() method. 通过调用next()方法从迭代器中检索下一个项目。 If default is given, it is returned if the iterator is exhausted, otherwise StopIteration is raised. 如果给定default,则在迭代器耗尽时返回,否则引发StopIteration。

If you want in several line : 如果你想要几行:

>>> for i in re.split(r'^S_hc_|[0-9]{1,2}\.|[0-9]{1,2}_|_|[0-9]{4,4}|-|[0-9]{1,3}T[0-9]{6}|\.|xml$',s2):
...   if i and re.match(r"^S_hc_(.*)\.(.*)\.(.*)_(.*)-(.*)\.xml$",s2):
...        print i
... 
IncorrectName

Maybe this is a longer solution but it will tell you what failed and what it expected. 也许这是一个更长的解决方案,但它会告诉你失败的原因和预期。 It is similar to Kasra's solution - breaking up the file name into individual bits and matching them in turn. 它类似于Kasra的解决方案 - 将文件名分成单个位并依次匹配它们。 This allows you to find out where the matching breaks: 这允许您找出匹配中断的位置:

import re
# break up the big file name pattern into individual bits that we can match
RX      = re.compile
pattern = [
        RX(r"\*"),
        RX(r"S_hc_"),
        RX(r"[0-9]{1,2}"),
        RX(r"\."),
        RX(r"[0-9]{1,2}"),
        RX(r"\."),
        RX(r"[0-9]{1,2}"),
        RX(r"_"),
        RX(r"[0-9]{4}"),
        RX(r"-"),
        RX(r"[0-9]{1,3}"),
        RX(r"T"),
        RX(r"[0-9]{6}"),
        RX(r"\.xml"),
        RX(r"\*")
        ]

# 'fn' is the file name matched so far
def reductor(fn, rx):
    if fn is None:
        return None
    mo = rx.match(fn)
    if mo is None:
        print "File name mismatch: got {}, expected {}".format(fn, rx.pattern)
        return None
    # proceed with the remainder of the string
    return fn[mo.end():]


validFile = lambda fn: reduce(reductor, pattern, fn) is not None

Let's test it: 我们来测试一下:

print validFile("*S_hc_1.2.3_2014-213T123121.xml*")
print validFile("*S_hc_1.2.IncorrectName_2014-213T123121.xml*")

Outputs: 输出:

True
File name mismatch: got IncorrectName_2014-213T123121.xml*, expected [0-9]{1,2}
False

Here is the method I am going to use, please let me know if cases mismatch: 这是我要使用的方法,如果案例不匹配,请告诉我:

def verifyFileName(self, filename__, pattern__):
    '''
    Verifies if a file name is correct
    :param filename__: file name 
    :param pattern__: pattern
    :return: empty string if file name is correct, otherwise the incorrect part of file
    '''
    incorrectPart =""
    pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
    for i in re.split(pattern, filename__):
        if len(i)>1:
            incorrectPart = i
    return incorrectPart

Here's the counterexample. 这是反例。 I've taken your method and defined three test cases - file names plus expected output. 我已经采用了您的方法并定义了三个测试用例 - 文件名和预期输出。

Here's the output, the code follows below: 这是输出,代码如下:

$> python m.py
S_hc_1.2.3_2014-213T123121.xml: PASS [expect None got None]
S_hc_1.2.3_Incorrect-213T123121.xml: PASS [expect Incorrect- got Incorrect-]
X_hc_1.2.3_2014-213T123121.xml: FAIL [expect X got None]

This is the code - cut & paste & run it. 这是代码 - 剪切和粘贴并运行它。

def verifyFileName(filename__, pattern__):
    '''
    Verifies if a file name is correct
    :param filename__: file name 
    :param pattern__: pattern
    :return: empty string if file name is correct, otherwise the incorrect part of file
    '''
    incorrectPart = None
    pattern = pattern__.replace('\.','|\.|').replace('_','|_|')
    for i in re.split(pattern, filename__):
        if len(i)>1:
            incorrectPart = i
    return incorrectPart


pattern = "^S_hc_[0-9]{1,2}\.[0-9]{1,2}\.[0-9]{1,2}_[0-9]{4,4}-[0-9]{1,3}T[0-9]{6,6}\.xml$"

# list of test cases: filenames + expected return from verifyFileName:
testcases = [
        # correct file name
        ("S_hc_1.2.3_2014-213T123121.xml", None),
        # obviously incorrect
        ("S_hc_1.2.3_Incorrect-213T123121.xml", "Incorrect-"),
        # subtly incorrect but still incorrect
        ("X_hc_1.2.3_2014-213T123121.xml", "X")
        ]

for (fn, expect) in testcases:
    res = verifyFileName(fn, pat)
    print "{}: {} [expect {} got {}]".format(fn, "PASS" if res==expect else "FAIL", expect, str(res))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM