简体   繁体   中英

Cut out a sequence of files using glob in python

I have a directory with files like img-0001.jpg, img-0005.pg, img-0006.jpg, ... , img-xxxx.jpg . What I need to do is to get a list with all files starting at 0238 , literally img-0238.jpg . The next existing filename is img-0240.jpg

Right now I use glob to get all filenames.

list_images = glob.glob(path_images + "*.jpg")

Thanks in advance

Edit:

-> The last filename is img-0315.jpg

Glob doesn't allow regex filtering. But you filter list right after you receive all matching files. Here is how it would look like using re :

import re

list_images = [f for f in glob.glob(path_images + "*.jpg") \
    if re.search(r'[1-9]\d{3,}|0[3-9]\d{2,}|02[4-9]\d|023[8-9]\.jpg$', f)]

The regular expression with verify that file ends with number with 4 digits bigger or equal 0238 .

You can play around with regular expression using https://regex101.com/

Basically, we check if number is:

  • starts with 1 followed by any 3 digits
  • or starts with 0[3-9] followed by any 2 digits
  • or starts with 02[4-9] followed by any 1 digit
  • or starts with 023 and followed by either 8 or 9 .

But it's probably would be easier to do simple comparison:

list_images = [f for f in glob.glob(path_images + "*.jpg") \
     if f[-8:-4] > "0237" and f[-8:-4] < "0316"]

You can specify multiple repeated wildcards to match all files whose number is 23[89] or 2[4-9][0-9] or 30[0-9] etc;

list_images = []
for pattern in ('023[89]', '02[4-9][0-9]', '030[0-9]', '031[0-5]'):
    list_images.extend(glob.glob(
        os.path.join(path_images, '*{0}.jpg'.format(pattern))))

or you can just filter out the ones you don't want.

list_images = [x for x in glob.glob(os.path.join(path_images, "*.jpg"))
    if 238 <= int(x[-8:-4]) <= 315]

For something like this, you could try the wcmatch library. It's a library that aims to enhance file globbing and wildcard matching.

In this example, we enable brace expansion and demonstrate the pattern by filtering a list of files:

from wcmatch import glob

files = []
# Generate list of files from img-0000.jpg to img-0315.jpg
for x in range(316):
    files.append('path/img-{:04d}.jpg'.format(x))

print(glob.globfilter(files, 'path/img-{0238..0315}.jpg', flags=glob.BRACE))

And we get the following output:

['path/img-0238.jpg', 'path/img-0239.jpg', 'path/img-0240.jpg', 'path/img-0241.jpg', 'path/img-0242.jpg', 'path/img-0243.jpg', 'path/img-0244.jpg', 'path/img-0245.jpg', 'path/img-0246.jpg', 'path/img-0247.jpg', 'path/img-0248.jpg', 'path/img-0249.jpg', 'path/img-0250.jpg', 'path/img-0251.jpg', 'path/img-0252.jpg', 'path/img-0253.jpg', 'path/img-0254.jpg', 'path/img-0255.jpg', 'path/img-0256.jpg', 'path/img-0257.jpg', 'path/img-0258.jpg', 'path/img-0259.jpg', 'path/img-0260.jpg', 'path/img-0261.jpg', 'path/img-0262.jpg', 'path/img-0263.jpg', 'path/img-0264.jpg', 'path/img-0265.jpg', 'path/img-0266.jpg', 'path/img-0267.jpg', 'path/img-0268.jpg', 'path/img-0269.jpg', 'path/img-0270.jpg', 'path/img-0271.jpg', 'path/img-0272.jpg', 'path/img-0273.jpg', 'path/img-0274.jpg', 'path/img-0275.jpg', 'path/img-0276.jpg', 'path/img-0277.jpg', 'path/img-0278.jpg', 'path/img-0279.jpg', 'path/img-0280.jpg', 'path/img-0281.jpg', 'path/img-0282.jpg', 'path/img-0283.jpg', 'path/img-0284.jpg', 'path/img-0285.jpg', 'path/img-0286.jpg', 'path/img-0287.jpg', 'path/img-0288.jpg', 'path/img-0289.jpg', 'path/img-0290.jpg', 'path/img-0291.jpg', 'path/img-0292.jpg', 'path/img-0293.jpg', 'path/img-0294.jpg', 'path/img-0295.jpg', 'path/img-0296.jpg', 'path/img-0297.jpg', 'path/img-0298.jpg', 'path/img-0299.jpg', 'path/img-0300.jpg', 'path/img-0301.jpg', 'path/img-0302.jpg', 'path/img-0303.jpg', 'path/img-0304.jpg', 'path/img-0305.jpg', 'path/img-0306.jpg', 'path/img-0307.jpg', 'path/img-0308.jpg', 'path/img-0309.jpg', 'path/img-0310.jpg', 'path/img-0311.jpg', 'path/img-0312.jpg', 'path/img-0313.jpg', 'path/img-0314.jpg', 'path/img-0315.jpg']

So, we could apply this to a file search:

from wcmatch import glob

list_images = glob.glob('path/img-{0238..0315}.jpg', flags=glob.BRACE)

In this example, we've hard coded the path, but in your example, make sure path_images has a trailing / so that the pattern is constructed correctly. Others have suggested this might be an issue. Print out your pattern to confirm the pattern is correct.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM