简体   繁体   中英

Python regular expression to match a string pattern and return the sub string

I have many files with names like:

<some name>_2536by1632.jpg
<some name1>_4800by2304.JPG
<some name2>_904by904.jpg

So, the name part varies, the extension is always jpg, but it can be in capitals also. There are limited values possible for x and y in <x>by<y> , the list of which I have in this format:

possible_sizes = [ (2536,1632), (4800,2304), ...]

I need to test a filename if it's of this pattern or not, and if yes, then I need to return the value of <x>by<y> string.

As of now, I do this without using regex. Something like this:

for item in possible_sizes:
    if "_{0}by{1}.jpg".format(item[0],item[1]) in filename.lower():
        dimension = "{0}by{1}".format(item[0],item[1])

But it's not a very clean solution and specially so when the possible values of sizes can increase in future.

How to do it using regex?

You could just use Python's string methods:

import os

# O(1) lookup time
possible_sizes = frozenset([(2536, 1632), (4800, 2304), ...])

name, extension = os.path.splitext(filename)
title, size = filename.rsplit('_')
width, height = map(int, size.split('by'))

if (width, height) in possible_sizes:
    print(width, height)

Might not be the smartest re, but should be easy to read.

The string:

  1. Can start with anything ^.*
  2. there has to be a underscore _
  3. followed by a number (at least consisting of 1 digit) \\d+
  4. next is 'by' by
  5. followed by a number (at least consisting of 1 digit) \\d+
  6. ending with .jpg or . JPG \\.(jpg|JPG)$

(?P<X> ....) makes a match accessible by the name X.

Leads to this expression "^.*_((?P<X>\\d+)by(?P<Y>\\d+))\\.(jpg|JPG)$"

Example program:

import re

possible_sizes = [ ( 2536, 1632 ), ( 4800, 2304 )]
names = ["<some name>_2536by1632.jpg", "<some name1>_4800by2304.JPG", "<some name2>_904by904.jpg"]
pattern = "^.*_((?P<X>\d+)by(?P<Y>\d+))\.(jpg|JPG)$"

for name in names:
    matchobj = re.match( pattern, name )
    if matchobj:
        if ( int( matchobj.group( "X" ) ), int( matchobj.group( "Y" ) ) ) in possible_sizes:
            print matchobj.group( 1 )

Output

2536by1632

4800by2304

This doesn't get to the spirit of your question, but I think it would actually work-

possible_sizes = { "_2536by1632.jpg" : (2536,1632), "_4800by2304.jpg" : (4800,2304)}
for filename in filenames:
    if filename.endswith in possible_sizes:
        return possible_sizes[filename]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM