[英]How to efficiently check if a folder contains a list of files?
我想检查某个文件夹中是否存在所有文件 (B01:B12)。 如果是这种情况,它应该返回True
。 我知道文件名的结尾,但开头可能会有所不同。
目前,我有以下代码。 它有效,但我觉得它可以做得更有效率。 有没有人知道如何改进这个?
def Check3(filename, root):
path = os.path.join(root, filename)
os.chdir(path)
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if filename.endswith('_B01.jp2'):
B01 = True
elif filename.endswith('_B02.jp2'):
B02 = True
elif filename.endswith('_B03.jp2'):
B03 = True
elif filename.endswith('_B04.jp2'):
B04 = True
elif filename.endswith('_B05.jp2'):
B05 = True
elif filename.endswith('_B06.jp2'):
B06 = True
elif filename.endswith('_B07.jp2'):
B07 = True
elif filename.endswith('_B08.jp2'):
B08 = True
elif filename.endswith('_B8A.jp2'):
B8A = True
elif filename.endswith('_B09.jp2'):
B09 = True
elif filename.endswith('_B10.jp2'):
B10 = True
elif filename.endswith('_B11.jp2'):
B11 = True
elif filename.endswith('_B12.jp2'):
B12 = True
return B01 and B02 and B03 and B04 and B05 and B06 and B07\
and B08 and B8A and B09 and B10 and B11 and B12
您可以使用pathlib
获取所有文件,从文件名中提取最后 8 个字符,然后构建预期的后缀,最后进行比较。
from pathlib import Path
all_last8 = set()
for path in Path(r'your directory').rglob('*.jp2'):
# exract last 8 chars of file name
all_last8.add(path.name[-8:])
# construct all expected suffixes
# hardcode this way, it is same run time efficient
# more verbose though
expected = {'_B01.jp2', '_B02.jp2', '_B03.jp2', } # ...
# if they are of same pattern
# expected = set([f'_B{str(i).zfill(2)}.jp2' for i in range(1, 13)])
valid = all_last8.issuperset(expected)
print(valid)
该代码首先获取所有文件名和后缀,可能有更有效的方法在全局比较时进行比较。
您可以使用glob
库,它会在您要检查的文件夹下列出符合给定条件的文件。
from glob import glob
def Check3(root):
# list the files which match a specific condition
files = glob('{}/*/*.jp2'.format(root))
# create the list of files you want to check that exists
extensions_check_list = ['_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2', '_B08.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2']
# if the number of found files is equal to the number of the expected returns True
return sum([file in extensions_check_list for file in files]) == len(extensions_check_list)
import wizzi_utils as wu # pip install wizzi_utils
def check_if_sequential(dir_path: str, files_suffix: list) -> bool:
files_in_dir = wu.find_files_in_folder(dir_path=dir_path, file_suffix='')
print('files_in_dir:')
for idx, f in enumerate(files_in_dir):
print('\t{}: {}'.format(idx + 1, f))
all_found = True
for suffix in files_suffix:
file_with_suffix_found = False
for file in files_in_dir:
if file.endswith(suffix):
file_with_suffix_found = True
break
if not file_with_suffix_found:
print('suffix {} not found'.format(suffix))
all_found = False
break
if all_found:
print('all files with suffix given found in folder')
else:
print('not all files found')
return all_found
def main() -> None:
files_suffix = [
'_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2',
'_B08.jp2', '_B8A.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2',
]
_ = check_if_sequential(dir_path='./my_files', files_suffix=files_suffix)
return
if __name__ == '__main__':
main()
如果所有文件后缀都在文件夹中(以及我们不需要的 1 个额外文件),output 将是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B06.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
13: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
14: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
all files with suffix given found in folder
现在删除一个并重新运行。 我删除了 bla_B06.jp2,output 将是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
13: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
suffix _B06.jp2 not found
not all files found
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.