[英]Faster way to check can image be open?
我有 200Gb 的图像文件夹,其中一些无法打开。 我想找到这些图像并从他们的文件夹中删除它们。
我试过这样的python代码:
for image in all_image:
try: # open image
except: # delete image
而且速度太慢了。 我怎样才能做得更快?
我如何并行化这段代码?
import PIL
import os
import cv2
from PIL import ImageFile
from tqdm import tqdm
from pathlib import Path
import pandas as pd
def create_df(data_path):
data = pd.DataFrame()
folder_namee = [i for i in data_root.iterdir() if i.is_dir()]
files = [j for i in sku_dirs for j in i.glob('*.jpg')]
data['path'] = [str(i) for i in files]
data['label'] = [i.parts[-2] for i in files]
return data
if __name__ == "__main__":
root = Path('some_path')
data_root = root / 'dataset'
df = create_df(data_root)
for i, row in tqdm(df.iterrows()):
try:
img = PIL.Image.open(row.path)
except Exception:
print(row.path)
print(row)
if os.path.exists(row.path):
os.remove(row.path)
您可以使用multiprocessing
来并行化进程。 例子:
import os
from PIL import Image
from multiprocessing.pool import ThreadPool
IMAGE_EXT = ('.jpg', '.jpeg', '.png', '.gif')
def check_image(image_path):
try:
Image.open(image_path)
print(f'Image is OK: {image_path}')
except:
os.remove(image_path)
print(f'Image deleted: {image_path}')
def delete_broken_images(root_dir):
pool = ThreadPool(processes=10)
for subdir, dirs, files in os.walk(root_dir):
for file in files:
if file.endswith(IMAGE_EXT):
image_path = os.path.join(subdir, file)
pool.apply_async(check_image, (image_path,)).get()
delete_broken_images(r'c:\so\69805310\images')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.