繁体   English   中英

如何有效地检查文件夹是否包含文件列表?

[英]How to efficiently check if a folder contains a list of files?

我想检查某个文件夹中是否存在所有文件 (B01:B12)。 如果是这种情况,它应该返回True 我知道文件名的结尾,但开头可能会有所不同。

目前,我有以下代码。 它有效,但我觉得它可以做得更有效率。 有没有人知道如何改进这个?

def Check3(filename, root):
    path = os.path.join(root, filename)
    os.chdir(path)
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            if filename.endswith('_B01.jp2'):
                B01 = True
            elif filename.endswith('_B02.jp2'):
                B02 = True
            elif filename.endswith('_B03.jp2'):
                B03 = True
            elif filename.endswith('_B04.jp2'):
                B04 = True
            elif filename.endswith('_B05.jp2'):
                B05 = True
            elif filename.endswith('_B06.jp2'):
                B06 = True
            elif filename.endswith('_B07.jp2'):
                B07 = True
            elif filename.endswith('_B08.jp2'):
                B08 = True
            elif filename.endswith('_B8A.jp2'):
                B8A = True
            elif filename.endswith('_B09.jp2'):
                B09 = True
            elif filename.endswith('_B10.jp2'):
                B10 = True
            elif filename.endswith('_B11.jp2'):
                B11 = True
            elif filename.endswith('_B12.jp2'):
                B12 = True

    return B01 and B02 and B03 and B04 and B05 and B06 and B07\
     and B08 and B8A and B09 and B10 and B11 and B12

您可以使用pathlib获取所有文件,从文件名中提取最后 8 个字符,然后构建预期的后缀,最后进行比较。

from pathlib import Path

all_last8 = set()
for path in Path(r'your directory').rglob('*.jp2'):
    # exract last 8 chars of file name
    all_last8.add(path.name[-8:])
# construct all expected suffixes
# hardcode this way, it is same run time efficient
# more verbose though
expected = {'_B01.jp2', '_B02.jp2', '_B03.jp2', }  # ...
# if they are of same pattern
# expected = set([f'_B{str(i).zfill(2)}.jp2' for i in range(1, 13)])

valid = all_last8.issuperset(expected)
print(valid)

该代码首先获取所有文件名和后缀,可能有更有效的方法在全局比较时进行比较。

您可以使用glob库,它会在您要检查的文件夹下列出符合给定条件的文件。

from glob import glob

def Check3(root):
    # list the files which match a specific condition
    files = glob('{}/*/*.jp2'.format(root))
    
    # create the list of files you want to check that exists
    extensions_check_list = ['_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2', '_B08.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2']
    
    # if the number of found files is equal to the number of the expected returns True
    return sum([file in extensions_check_list for file in files]) == len(extensions_check_list)
import wizzi_utils as wu  # pip install wizzi_utils


def check_if_sequential(dir_path: str, files_suffix: list) -> bool:
    files_in_dir = wu.find_files_in_folder(dir_path=dir_path, file_suffix='')
    print('files_in_dir:')
    for idx, f in enumerate(files_in_dir):
        print('\t{}: {}'.format(idx + 1, f))
    all_found = True
    for suffix in files_suffix:
        file_with_suffix_found = False
        for file in files_in_dir:
            if file.endswith(suffix):
                file_with_suffix_found = True
                break
        if not file_with_suffix_found:
            print('suffix {} not found'.format(suffix))
            all_found = False
            break
    if all_found:
        print('all files with suffix given found in folder')
    else:
        print('not all files found')
    return all_found


def main() -> None:
    files_suffix = [
        '_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2',
        '_B08.jp2', '_B8A.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2',
    ]
    _ = check_if_sequential(dir_path='./my_files', files_suffix=files_suffix)
    return


if __name__ == '__main__':
    main()

如果所有文件后缀都在文件夹中(以及我们不需要的 1 个额外文件),output 将是:

files_in_dir:
    1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
    2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
    3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
    4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
    5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B06.jp2
    6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
    7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
    8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
    9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
    10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
    11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
    12: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
    13: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
    14: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
all files with suffix given found in folder

现在删除一个并重新运行。 我删除了 bla_B06.jp2,output 将是:

files_in_dir:
    1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
    2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
    3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
    4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
    5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
    6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
    7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
    8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
    9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
    10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
    11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
    12: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
    13: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
suffix _B06.jp2 not found
not all files found

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM