[英]How to efficiently check if a folder contains a list of files?
I want to check if all of the files (B01:B12) are present in a certain folder.我想检查某个文件夹中是否存在所有文件 (B01:B12)。 If that is the case it should return
True
.如果是这种情况,它应该返回
True
。 I know the end of the filenames, but the beginning can vary.我知道文件名的结尾,但开头可能会有所不同。
Currently, I have the following code.目前,我有以下代码。 It works, but I feel that it can be done a lot more efficient.
它有效,但我觉得它可以做得更有效率。 Does anyone have an idea on how to improve this?
有没有人知道如何改进这个?
def Check3(filename, root):
path = os.path.join(root, filename)
os.chdir(path)
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
if filename.endswith('_B01.jp2'):
B01 = True
elif filename.endswith('_B02.jp2'):
B02 = True
elif filename.endswith('_B03.jp2'):
B03 = True
elif filename.endswith('_B04.jp2'):
B04 = True
elif filename.endswith('_B05.jp2'):
B05 = True
elif filename.endswith('_B06.jp2'):
B06 = True
elif filename.endswith('_B07.jp2'):
B07 = True
elif filename.endswith('_B08.jp2'):
B08 = True
elif filename.endswith('_B8A.jp2'):
B8A = True
elif filename.endswith('_B09.jp2'):
B09 = True
elif filename.endswith('_B10.jp2'):
B10 = True
elif filename.endswith('_B11.jp2'):
B11 = True
elif filename.endswith('_B12.jp2'):
B12 = True
return B01 and B02 and B03 and B04 and B05 and B06 and B07\
and B08 and B8A and B09 and B10 and B11 and B12
You can use pathlib
to get all files, extract last 8 characters from the file names, then build expected suffixes, compare lastly.您可以使用
pathlib
获取所有文件,从文件名中提取最后 8 个字符,然后构建预期的后缀,最后进行比较。
from pathlib import Path
all_last8 = set()
for path in Path(r'your directory').rglob('*.jp2'):
# exract last 8 chars of file name
all_last8.add(path.name[-8:])
# construct all expected suffixes
# hardcode this way, it is same run time efficient
# more verbose though
expected = {'_B01.jp2', '_B02.jp2', '_B03.jp2', } # ...
# if they are of same pattern
# expected = set([f'_B{str(i).zfill(2)}.jp2' for i in range(1, 13)])
valid = all_last8.issuperset(expected)
print(valid)
The code firstly get all file names and suffixes, there could be more efficient ways that compares while globing.该代码首先获取所有文件名和后缀,可能有更有效的方法在全局比较时进行比较。
You could use the glob
library, it lists the files that match a given condition under the folders you want to check.您可以使用
glob
库,它会在您要检查的文件夹下列出符合给定条件的文件。
from glob import glob
def Check3(root):
# list the files which match a specific condition
files = glob('{}/*/*.jp2'.format(root))
# create the list of files you want to check that exists
extensions_check_list = ['_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2', '_B08.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2']
# if the number of found files is equal to the number of the expected returns True
return sum([file in extensions_check_list for file in files]) == len(extensions_check_list)
import wizzi_utils as wu # pip install wizzi_utils
def check_if_sequential(dir_path: str, files_suffix: list) -> bool:
files_in_dir = wu.find_files_in_folder(dir_path=dir_path, file_suffix='')
print('files_in_dir:')
for idx, f in enumerate(files_in_dir):
print('\t{}: {}'.format(idx + 1, f))
all_found = True
for suffix in files_suffix:
file_with_suffix_found = False
for file in files_in_dir:
if file.endswith(suffix):
file_with_suffix_found = True
break
if not file_with_suffix_found:
print('suffix {} not found'.format(suffix))
all_found = False
break
if all_found:
print('all files with suffix given found in folder')
else:
print('not all files found')
return all_found
def main() -> None:
files_suffix = [
'_B01.jp2', '_B02.jp2', '_B03.jp2', '_B04.jp2', '_B05.jp2', '_B06.jp2', '_B07.jp2',
'_B08.jp2', '_B8A.jp2', '_B09.jp2', '_B10.jp2', '_B11.jp2', '_B12.jp2',
]
_ = check_if_sequential(dir_path='./my_files', files_suffix=files_suffix)
return
if __name__ == '__main__':
main()
If all files suffix are in the folder(and 1 extra file that we dont need), the output will be:如果所有文件后缀都在文件夹中(以及我们不需要的 1 个额外文件),output 将是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B06.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
13: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
14: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
all files with suffix given found in folder
Now delete one and rerun.现在删除一个并重新运行。 i deleted bla_B06.jp2, the output will be:
我删除了 bla_B06.jp2,output 将是:
files_in_dir:
1: D:/workspace/2021wizzi_utils/temp/my_files/bla_B01.jp2
2: D:/workspace/2021wizzi_utils/temp/my_files/bla_B02.jp2
3: D:/workspace/2021wizzi_utils/temp/my_files/bla_B03.jp2
4: D:/workspace/2021wizzi_utils/temp/my_files/bla_B04.jp2
5: D:/workspace/2021wizzi_utils/temp/my_files/bla_B07.jp2
6: D:/workspace/2021wizzi_utils/temp/my_files/bla_B08.jp2
7: D:/workspace/2021wizzi_utils/temp/my_files/bla_B09.jp2
8: D:/workspace/2021wizzi_utils/temp/my_files/bla_B10.jp2
9: D:/workspace/2021wizzi_utils/temp/my_files/bla_B11.jp2
10: D:/workspace/2021wizzi_utils/temp/my_files/bla_B12.jp2
11: D:/workspace/2021wizzi_utils/temp/my_files/bla_B8A.jp2
12: D:/workspace/2021wizzi_utils/temp/my_files/random_file.txt
13: D:/workspace/2021wizzi_utils/temp/my_files/x_B05.jp2
suffix _B06.jp2 not found
not all files found
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.