[英]How to efficiently check if a folder contains a list of files?
我想檢查某個文件夾中是否存在所有文件 (B01:B12)。 如果是這種情況,它應該返回True
。 我知道文件名的結尾,但開頭可能會有所不同。
目前,我有以下代碼。 它有效,但我覺得它可以做得更有效率。 有沒有人知道如何改進這個?
def Check3(filename, root):
path = os.path.join(root, filename)
os.chdir(path)
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if filename.endswith('_B01.jp2'):
B01 = True
elif filename.endswith('_B02.jp2'):
B02 = True
elif filename.endswith('_B03.jp2'):
B03 = True
elif filename.endswith('_B04.jp2'):
B04 = True
elif filename.endswith('_B05.jp2'):
B05 = True
elif filename.endswith('_B06.jp2'):
B06 = True
elif filename.endswith('_B07.jp2'):
B07 = True
elif filename.endswith('_B08.jp2'):
B08 = True
elif filename.endswith('_B8A.jp2'):
B8A = True
elif filename.endswith('_B09.jp2'):
B09 = True
elif filename.endswith('_B10.jp2'):
B10 = True
elif filename.endswith('_B11.jp2'):
B11 = True
elif filename.endswith('_B12.jp2'):
B12 = True
return B01 and B02 and B03 and B04 and B05 and B06 and B07\
and B08 and B8A and B09 and B10 and B11 and B12
您可以使用pathlib
獲取所有文件,從文件名中提取最后 8 個字符,然后構建預期的后綴,最后進行比較。
from pathlib import Path
all_last8 = set()
for path in Path(r'your directory').rglob('*.jp2'):
# exract last 8 chars of file name
all_last8.add(path.name[-8:])
# construct all expected suffixes
# hardcode this way, it is same run time efficient
# more verbose though
expected = {'_B01.jp2', '_B02.jp2', '_B03.jp2', } # ...
# if they are of same pattern
# expected = set([f'_B{str(i).zfill(2)}.jp2' for i in range(1, 13)])
valid = all_last8.issuperset(expected)
print(valid)
該代碼首先獲取所有文件名和后綴,可能有更有效的方法在全局比較時進行比較。
您可以使用glob
庫,它會在您要檢查的文件夾下列出符合給定條件的文件。
from glob import glob
def Check3(root):
# list the files which match a specific condition
files = glob('{}/*/*.jp2'.format(root))
# create the list of files you want to check that exists
extensions_check_list = ['_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2', '_B08.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2']
# if the number of found files is equal to the number of the expected returns True
return sum([file in extensions_check_list for file in files]) == len(extensions_check_list)
import wizzi_utils as wu # pip install wizzi_utils
def check_if_sequential(dir_path: str, files_suffix: list) -> bool:
files_in_dir = wu.find_files_in_folder(dir_path=dir_path, file_suffix='')
print('files_in_dir:')
for idx, f in enumerate(files_in_dir):
print('\t{}: {}'.format(idx + 1, f))
all_found = True
for suffix in files_suffix:
file_with_suffix_found = False
for file in files_in_dir:
if file.endswith(suffix):
file_with_suffix_found = True
break
if not file_with_suffix_found:
print('suffix {} not found'.format(suffix))
all_found = False
break
if all_found:
print('all files with suffix given found in folder')
else:
print('not all files found')
return all_found
def main() -> None:
files_suffix = [
'_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2',
'_B08.jp2', '_B8A.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2',
]
_ = check_if_sequential(dir_path='./my_files', files_suffix=files_suffix)
return
if __name__ == '__main__':
main()
如果所有文件后綴都在文件夾中(以及我們不需要的 1 個額外文件),output 將是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B06.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
13: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
14: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
all files with suffix given found in folder
現在刪除一個並重新運行。 我刪除了 bla_B06.jp2,output 將是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
13: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
suffix _B06.jp2 not found
not all files found
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.