简体   繁体   中英

Using regex in python to get episode numbers from file name

I have collected a large number of TV series on media server over the years. I wrote a script to go through and rename them all with a proper filename scheme, but I am having some trouble with the regex when trying to target multiple naming schemes.

This is my current function, which works well for getting the episode number from filenames with the scheme "s01e01"

def getEpisode(filename):
    matches = re.findall(r"e[0-9][0-9]", filename)
    if len(matches) == 1:
        episode = matches[0]
        episode = stripEp(episode)
        return episode  
    else:
        return False

def stripEp(target):
    target = target.strip()
    target = target.strip('abcdefghijklmnopqrstuvwxyz.')
    return target

What I need to do is grab the episode number from a filename when multiple schemes are being used. I spent a while googling and tried the following.

matches = re.findall(r"(e[0-9][0-9]|E[0-9][0-9]|x[0-9][0-9]|X[0-9][0-9]|episode [0-9][0-9]|Episode [0-9][0-9]|\n[0-9][0-9])", filename)

This works in regex testers such as RegexPal and Python Regex Tool

When I plug it into my function however, it doesn't work. This has me stumped since it seems to work in the python regex tool I linked to above. Any help would be greatly appreciated.

EDIT: Here are some examples of the schemes the files use.

Series Name s01e01.avi

Series Name 1x01.avi

Series Name episode 01.avi

01 Episode Title.avi

The filename does not contain '\\n' . You could use ^ to indicate the start of the string, instead:

def getEpisode(filename):
    match = re.search(
        r'''(?ix)                 # Ignore case (i), and use verbose regex (x)
        (?:                       # non-grouping pattern
          e|x|episode|^           # e or x or episode or start of a line
          )                       # end non-grouping pattern 
        \s*                       # 0-or-more whitespaces
        (\d{2})                   # exactly 2 digits
        ''', filename)
    if match:
        return match.group(1)

tests = (
    'Series Name s01e01.avi',
    'Series Name 1x01.avi',
    'Series Name episode 01.avi',
    '01 Episode Title.avi'
    )
for filename in tests:
    print(getEpisode(filename))

yields

01
01
01
01

I removed else: return False since Python returns None if it reaches the end of a function without already returning anything. Since None has boolean value False , you can test for no match with epsiode = getEpisode(filename); if episode: ... epsiode = getEpisode(filename); if episode: ... .

Just for the clarity of the regular expression

re.findall(r"(?:e|x|episode|\n)(\d{2})", filename, re.I)

and to get the season

re.findall(r"(?:s|season)(\d{2})(?:e|x|episode|\n)(\d{2})", filename, re.I)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM