简体   繁体   English

没有使用split()和glob.glob()的输出匹配文件名

[英]No output matching file names using split() and glob.glob()

I am trying to find all files in a bunch of subdirectories that have either the form: 我试图在具有以下形式的一堆子目录中查找所有文件:
sub-num_ses-wavenum_task-name_run-num_info.ext
or 要么
sub-num_ses-wavenum_task-name_info.ext

The part of the file name run-num can take the form run-01 through run-15 or higher depending on the number files with matching task-name segments. 文件名run-num可以采用run-01run-15或更高的形式,具体取决于具有匹配task-name段的文件数量。 There is not run-num if there are no duplicate task names. 如果没有重复的任务名称,则没有run-num编号。

The script can successfully enter the directories and I can break the file name into chunks by separating at _ 该脚本可以成功输入目录,并且可以通过在_处将文件名分成多个块

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                        print(chunks)

Where print(chunks) will give the output: 在哪里print(chunks)将给出输出:
['sub-num', 'ses-wavenum', 'task-name', 'run-num', 'info.ext']
or, if there is no run-num : 或者,如果没有run-num
['sub-num', 'ses-wavenum', 'task-name', 'info.ext']

I can also break out the part I want to check to see whether it is a run number or not: 我还可以分解要检查的部分,以查看它是否是运行编号:

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                        print(chunks[-2])

Returns, eg: 返回,例如:
run-02 , if there is a run number, or run-02 ,如果有运行编号,或者
task-name , if there is no run number. task-name ,如果没有运行编号。

BUT , my problem is that I can't seem to list out only those files that have a run number: 但是 ,我的问题是我似乎无法列出具有运行编号的文件:

niidir="some/path"  
for dirpath, dirnames, files in os.walk(niidir): 
    for dirname in dirnames:
        if dirname == "fmap" or dirname == "anat" or dirname == "func":
            fullpath = dirpath + "/" + dirname
            for files in fullpath:
                for file in os.listdir(fullpath):
                    chunks = file.split("_")
                    if chunks[-2]) == glob.glob("run-[0-9]{2}"):
                        print(chunks[-2])

Gives me no output at all. 没有任何输出。

I'm at a loss as to why I can't find the matching sting. 我不知道为什么找不到匹配的st。

Edit 1: 编辑1:
path to files is niidir/sub-num/ses-num/sequence/files 文件的路径是niidir/sub-num/ses-num/sequence/files

There are multiple sub-num directories in clean_nii and multiple ses-num directories in each sub-num directory. clean_nii中有多个子编号目录,每个子编号目录中都有多个ses-num目录。 Each ses-num directory contains some or all of the following sequence directories: "anat", "func", or "fmap" in which are the files. 每个ses-num目录包含以下一些或所有序列目录:文件中的“ anat”,“ func”或“ fmap”。

Edit 2: I'm not a programmer. 编辑2:我不是程序员。 Please don't assume I know what you're talking about, even if it's "basic". 请不要以为我知道您在说什么,即使它是“基本的”。 I'm trying. 我正在努力。

You are using the wrong syntax for the glob, and you are using globbing incorrectly. 您为glob使用了错误的语法,并且错误地使用了glob。 Your glob() call wants to match a literal {2} string after a single digit, and you are trying to use a function that produces a list of files to test if a string matches a pattern. 您的glob()调用希望在单个数字后匹配文字{2}字符串,并且您正在尝试使用产生文件列表的函数来测试字符串是否与模式匹配。

The correct pattern to match your files would be: 匹配文件的正确模式是:

glob.glob("run-[0-9][0-9]*")

Glob patterns are not regular expressions . 球形模式不是正则表达式 See the wikipedia article on glob syntax , and the fnmatch module for details. 有关详细信息,请参见有关glob语法Wikipedia文章fnmatch模块

Next, glob.glob() looks up files on the filesystem, and returns a list of matching filenames . 接下来, glob.glob()在文件系统上查找文件,并返回匹配文件名列表 The above pattern has no path information so only lists files in the local working directory. 上面的模式没有路径信息,因此仅列出本地工作目录中的文件。 You'd have to use glob.glob(os.path.join(fullpath, "run-[0-9][0-9]*") to match specific files in a directory, at which point the list will consist of full paths . You should not compare that list with a single string, chunks[-2] is never going to be equal to a list of matching filenames. 您必须使用glob.glob(os.path.join(fullpath, "run-[0-9][0-9]*")来匹配目录中的特定文件,此时列表将包括完整路径 。您不应该将该列表与单个字符串进行比较, chunks[-2] 绝不等于匹配文件名的列表。

If you want to see if your string matches a specific globbing pattern, you'd use the fname.fnmatch() function : 如果要查看您的字符串是否匹配特定的fname.fnmatch()模式,则可以使用fname.fnmatch()函数

if fnmatch.fnmatch(chunks[-2], 'run-[0-9][0-9]'):

Now you are actually testing if your filename part consists of the string run- at the start, followed by two digits. 现在,您实际上正在测试文件名部分是否由开头的字符串run-以及后两位数字组成。

Next, your for files in fullpath loop iterates over individual characters of the fullpath string. 接下来, for files in fullpath循环中的for files in fullpath遍历fullpath字符串的各个字符 You are repeating this loop len(filepath) times, without any need to repeat anything. 您将重复此循环len(filepath)次,而无需重复任何操作。 You ignore the files variable, you are just doing needless extra work. 您忽略了files变量,只是在做不必要的额外工作。

Next, your code is still doing more work than it needs to. 接下来,您的代码仍在做比所需的更多的工作。 os.walk() will already list filenames in directories, yet your code lists them redundantly with a os.listdir() call. os.walk() 已经在目录中列出了文件名,但是您的代码通过os.listdir()调用多余地列出了它们。 Either prune your dirnames list after you found those specific subdirectories, or test dirpath for a matching subdirectory and process files instead: 找到那些特定的子目录后,修剪dirnames列表,或者测试dirpath以查找匹配的子目录并处理files

import os
import os.path
import fnmatch

niidir="some/path" 

for dirpath, dirnames, files in os.walk(niidir): 
    directory_name = os.path.basename(dirpath)
    if directory not in {'fmap', 'anat', 'func'}:
        # Only process files in specific subdirectories
        continue
    for filename in fnmatch.filter(files, "run-[0-9][0-9]*"):
        # process matching file

I used the fnmatch.filter() function to filter out matching names from the files list produced by os.walk() . 我使用了fnmatch.filter()函数os.walk()生成的files列表中过滤出匹配的名称。

Alternatively, stick to fnmatch.fnmatch() if you want to process all files in the directory and only test specific files in the larger list for your pattern: 或者,如果要处理目录中的所有文件,而仅对模式的较大列表中的特定文件进行测试,则坚持使用fnmatch.fnmatch()

for dirpath, dirnames, files in os.walk(niidir): 
    directory_name = os.path.basename(dirpath)
    if directory not in {'fmap', 'anat', 'func'}:
        # Only process files in specific subdirectories
        continue
    for filename in files:
        prefix, remainder = filename.partition('_')
        if fnmatch.fnmatch(prefix, 'run-[0-9][0-9]'):
            # filename starts with a run-number.
        else:
            # do something else

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM