简体   繁体   中英

No output matching file names using split() and glob.glob()

I am trying to find all files in a bunch of subdirectories that have either the form:
sub-num_ses-wavenum_task-name_run-num_info.ext
or
sub-num_ses-wavenum_task-name_info.ext

The part of the file name run-num can take the form run-01 through run-15 or higher depending on the number files with matching task-name segments. There is not run-num if there are no duplicate task names.

The script can successfully enter the directories and I can break the file name into chunks by separating at _

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                        print(chunks)

Where print(chunks) will give the output:
['sub-num', 'ses-wavenum', 'task-name', 'run-num', 'info.ext']
or, if there is no run-num :
['sub-num', 'ses-wavenum', 'task-name', 'info.ext']

I can also break out the part I want to check to see whether it is a run number or not:

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                        print(chunks[-2])

Returns, eg:
run-02 , if there is a run number, or
task-name , if there is no run number.

BUT , my problem is that I can't seem to list out only those files that have a run number:

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                    if chunks[-2]) == glob.glob("run-[0-9]{2}"):
                        print(chunks[-2])

Gives me no output at all.

I'm at a loss as to why I can't find the matching sting.

Edit 1:
path to files is niidir/sub-num/ses-num/sequence/files

There are multiple sub-num directories in clean_nii and multiple ses-num directories in each sub-num directory. Each ses-num directory contains some or all of the following sequence directories: "anat", "func", or "fmap" in which are the files.

Edit 2: I'm not a programmer. Please don't assume I know what you're talking about, even if it's "basic". I'm trying.

You are using the wrong syntax for the glob, and you are using globbing incorrectly. Your glob() call wants to match a literal {2} string after a single digit, and you are trying to use a function that produces a list of files to test if a string matches a pattern.

The correct pattern to match your files would be:

glob.glob("run-[0-9][0-9]*")

Glob patterns are not regular expressions . See the wikipedia article on glob syntax , and the fnmatch module for details.

Next, glob.glob() looks up files on the filesystem, and returns a list of matching filenames . The above pattern has no path information so only lists files in the local working directory. You'd have to use glob.glob(os.path.join(fullpath, "run-[0-9][0-9]*") to match specific files in a directory, at which point the list will consist of full paths . You should not compare that list with a single string, chunks[-2] is never going to be equal to a list of matching filenames.

If you want to see if your string matches a specific globbing pattern, you'd use the fname.fnmatch() function :

if fnmatch.fnmatch(chunks[-2], 'run-[0-9][0-9]'):

Now you are actually testing if your filename part consists of the string run- at the start, followed by two digits.

Next, your for files in fullpath loop iterates over individual characters of the fullpath string. You are repeating this loop len(filepath) times, without any need to repeat anything. You ignore the files variable, you are just doing needless extra work.

Next, your code is still doing more work than it needs to. os.walk() will already list filenames in directories, yet your code lists them redundantly with a os.listdir() call. Either prune your dirnames list after you found those specific subdirectories, or test dirpath for a matching subdirectory and process files instead:

import os
import os.path
import fnmatch

niidir="some/path" 

for dirpath, dirnames, files in os.walk(niidir): 
    directory_name = os.path.basename(dirpath)
    if directory not in {'fmap', 'anat', 'func'}:
        # Only process files in specific subdirectories
        continue
    for filename in fnmatch.filter(files, "run-[0-9][0-9]*"):
        # process matching file

I used the fnmatch.filter() function to filter out matching names from the files list produced by os.walk() .

Alternatively, stick to fnmatch.fnmatch() if you want to process all files in the directory and only test specific files in the larger list for your pattern:

for dirpath, dirnames, files in os.walk(niidir): 
    directory_name = os.path.basename(dirpath)
    if directory not in {'fmap', 'anat', 'func'}:
        # Only process files in specific subdirectories
        continue
    for filename in files:
        prefix, remainder = filename.partition('_')
        if fnmatch.fnmatch(prefix, 'run-[0-9][0-9]'):
            # filename starts with a run-number.
        else:
            # do something else

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM