简体   繁体   English

用于检查字符串中单词的开头和结尾的python正则表达式

[英]python regular expression to check start and end of a word in a string

I am working on a script to rename files. 我正在编写一个脚本来重命名文件。 In this scenario there are three possibilities. 在这种情况下,有三种可能性。

1.file does not exist: Create new file 1.file不存在:创建新文件

2.File exists: create new file with filename '(number of occurence of file)'.eg filename(1) 2.文件存在:用文件名'(文件出现次数)'创建新文件。例如文件名(1)

3.Duplicate of file already exists: create new file with filename '(number of occurence of file)'.eg filename(2) 3.文件的重复已存在:创建文件名为'(文件出现次数)'的新文件。例如文件名(2)

I have the filename in a string. 我有一个字符串中的文件名。 I can check the last character of filename using regex but how to check the last characters from '(' to ')' and get the number inside it? 我可以使用正则表达式检查文件名的最后一个字符但是如何检查'('到')'中的最后一个字符并获取其中的数字?

You just need something like this: 你只需要这样的东西:

(?<=\()(\d+)(?=\)[^()]*$)

Demo 演示

Explanation: 说明:

  • (?<=\\() must be preceded by a literal ( (?<=\\()必须以文字开头(
  • (\\d+) match and capture the digits (\\d+)匹配并捕获数字
  • (?=\\)[^()]+$) must be followed by ) and then no more ( or ) until the end of the string. (?=\\)[^()]+$)必须后跟)然后不再()直到字符串结尾。

Example: if the file name is Foo (Bar) Baz (23).jpg , the regex above matches 23 示例:如果文件名是Foo (Bar) Baz (23).jpg ,则上面的正则表达式匹配23

Here is the code and tests to get a filename based on existing filenames: 以下是基于现有文件名获取文件名的代码和测试:

import re

def get_name(filename, existing_names):
    exist = False
    index = 0

    p = re.compile("^%s(\((?P<idx>\d+)\))?$" % filename)

    for name in existing_names:
        m = p.match(name)
        if m:
            exist = True
            idx = m.group('idx')
            if idx and int(idx) > index:
                index = int(idx)
    if exist:
        return "%s(%d)" % (filename, index + 1)
    else:
        return filename

# test data
exists = ["abc(1)", "ab", "abc", "abc(2)", "ab(1)", "de", "ab(5)"]
tests = ["abc", "ab", "de", "xyz"]
expects = ["abc(3)", "ab(6)", "de(1)", "xyz"]

print exists
for name, exp in zip(tests, expects):
    new_name = get_name(name, exists)
    print "%s -> %s" % (name, new_name)
    assert new_name == exp

Look at this line for the regex to get the number in (*) : 查看此行以获取正则表达式以获取(*)的数字:

p = re.compile("^%s(\\((?P<idx>\\d+)\\))?$" % filename)

Here it uses a named capture ?P<idx>\\d+ for the number \\d+ , and access the capture later with m.group('idx') . 在这里,它使用命名的捕获?P<idx>\\d+作为数字\\d+ ,稍后使用m.group('idx')访问捕获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM