简体   繁体   中英

File name matching with fnmatch

I have a directory with files with the format: LnLnnnnLnnn.txt

where L = letters and n = numbers. Eg: p2c0789c001.txt

I would like to separate these files based on whether the second number (ie 0789) is within a certain sequence of numbers (eg 0001 to 0146).

Is there an easy way to do this with fnmatch? Or should I be using regex?

This is the code I have so far:

out_files = []
for root, dirs, filenames in os.walk('.'):
   for filename in fnmatch.filter(filenames, '???[0-9][0-9][0-9][0-9]????*.txt'):
       out_files.append(os.path.join(root, filename))

You can't do it easily inside fnmatch.filter() , but you could do it yourself:

out_files = []
for root, dirs, filenames in os.walk('.'):
   for filename in fnmatch.filter(filenames, '???[0-9][0-9][0-9][0-9]????*.txt'):
       if(1 <= int(filename[3:7]) <= 146):
           out_files.append(os.path.join(root, filename))

Or, for the list-comprehension fans:

import os
import fnmatch
out_files = [os.path.join(root, filename)
             for root, dirs, filenames in os.walk('.')
             for filename in fnmatch.filter(filenames,
                                            '???[0-9][0-9][0-9][0-9]????*.txt')
             if 1 <= int(filename[3:7]) <= 146]

EDIT : Whoops, forgot an extra for loop. Also, see if this has better performance.

EDIT2 : Just in case the first letter is a c , checks the second to last element, which based on the criteria for both alternatives is guaranteed to exist.

out_files = []
for root, dirs, filenames in os.walk('.'):
    for filename in filesnames:
        try:
            if  1 <= int(filename.split('c')[-2]) <= 146:
                out_files.append(...)
        except IndexError:
            continue

Alternatively, using a generator:

out_files = []
for root, dirs, filenames in os.walk('.'):
    for filename in (name for name in filenames if 'c' in name):
        if  1 <= int(filename.split('c')[-2]) <= 146:
            out_files.append(...)

In case there are other c's at the start of the string or the string length before the numbers changes:

if 1 <= int(re.findall(r"c([0-9]+)c", s)[0]) <= 487 :

Or if there are always four digits:

if 1 <= int(re.findall(r"c(\d{4})c", s)[0]) <= 487:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM