簡體   English   中英

如何有效地檢查文件夾是否包含文件列表?

[英]How to efficiently check if a folder contains a list of files?

我想檢查某個文件夾中是否存在所有文件 (B01:B12)。 如果是這種情況,它應該返回True 我知道文件名的結尾,但開頭可能會有所不同。

目前,我有以下代碼。 它有效,但我覺得它可以做得更有效率。 有沒有人知道如何改進這個?

def Check3(filename, root):
    path = os.path.join(root, filename)
    os.chdir(path)
    for dirpath, dirnames, filenames in os.walk(path):
        for filename in filenames:
            if filename.endswith('_B01.jp2'):
                B01 = True
            elif filename.endswith('_B02.jp2'):
                B02 = True
            elif filename.endswith('_B03.jp2'):
                B03 = True
            elif filename.endswith('_B04.jp2'):
                B04 = True
            elif filename.endswith('_B05.jp2'):
                B05 = True
            elif filename.endswith('_B06.jp2'):
                B06 = True
            elif filename.endswith('_B07.jp2'):
                B07 = True
            elif filename.endswith('_B08.jp2'):
                B08 = True
            elif filename.endswith('_B8A.jp2'):
                B8A = True
            elif filename.endswith('_B09.jp2'):
                B09 = True
            elif filename.endswith('_B10.jp2'):
                B10 = True
            elif filename.endswith('_B11.jp2'):
                B11 = True
            elif filename.endswith('_B12.jp2'):
                B12 = True

    return B01 and B02 and B03 and B04 and B05 and B06 and B07\
     and B08 and B8A and B09 and B10 and B11 and B12

您可以使用pathlib獲取所有文件,從文件名中提取最后 8 個字符,然后構建預期的后綴,最后進行比較。

from pathlib import Path

all_last8 = set()
for path in Path(r'your directory').rglob('*.jp2'):
    # exract last 8 chars of file name
    all_last8.add(path.name[-8:])
# construct all expected suffixes
# hardcode this way, it is same run time efficient
# more verbose though
expected = {'_B01.jp2', '_B02.jp2', '_B03.jp2', }  # ...
# if they are of same pattern
# expected = set([f'_B{str(i).zfill(2)}.jp2' for i in range(1, 13)])

valid = all_last8.issuperset(expected)
print(valid)

該代碼首先獲取所有文件名和后綴,可能有更有效的方法在全局比較時進行比較。

您可以使用glob庫,它會在您要檢查的文件夾下列出符合給定條件的文件。

from glob import glob

def Check3(root):
    # list the files which match a specific condition
    files = glob('{}/*/*.jp2'.format(root))
    
    # create the list of files you want to check that exists
    extensions_check_list = ['_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2', '_B08.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2']
    
    # if the number of found files is equal to the number of the expected returns True
    return sum([file in extensions_check_list for file in files]) == len(extensions_check_list)
import wizzi_utils as wu  # pip install wizzi_utils


def check_if_sequential(dir_path: str, files_suffix: list) -> bool:
    files_in_dir = wu.find_files_in_folder(dir_path=dir_path, file_suffix='')
    print('files_in_dir:')
    for idx, f in enumerate(files_in_dir):
        print('\t{}: {}'.format(idx + 1, f))
    all_found = True
    for suffix in files_suffix:
        file_with_suffix_found = False
        for file in files_in_dir:
            if file.endswith(suffix):
                file_with_suffix_found = True
                break
        if not file_with_suffix_found:
            print('suffix {} not found'.format(suffix))
            all_found = False
            break
    if all_found:
        print('all files with suffix given found in folder')
    else:
        print('not all files found')
    return all_found


def main() -> None:
    files_suffix = [
        '_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2',
        '_B08.jp2', '_B8A.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2',
    ]
    _ = check_if_sequential(dir_path='./my_files', files_suffix=files_suffix)
    return


if __name__ == '__main__':
    main()

如果所有文件后綴都在文件夾中(以及我們不需要的 1 個額外文件),output 將是:

files_in_dir:
    1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
    2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
    3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
    4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
    5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B06.jp2
    6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
    7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
    8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
    9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
    10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
    11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
    12: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
    13: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
    14: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
all files with suffix given found in folder

現在刪除一個並重新運行。 我刪除了 bla_B06.jp2,output 將是:

files_in_dir:
    1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
    2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
    3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
    4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
    5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
    6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
    7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
    8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
    9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
    10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
    11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
    12: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
    13: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
suffix _B06.jp2 not found
not all files found

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM