繁体   English   中英

更快的检查方法可以打开图像吗?

[英]Faster way to check can image be open?

我有 200Gb 的图像文件夹,其中一些无法打开。 我想找到这些图像并从他们的文件夹中删除它们。

我试过这样的python代码:

for image in all_image:   
   try: # open image
   except: # delete image 

而且速度太慢了。 我怎样才能做得更快?

我如何并行化这段代码?

import PIL
import os
import cv2
from PIL import ImageFile
from tqdm import tqdm
from pathlib import Path
import pandas as pd


def create_df(data_path):
    data = pd.DataFrame()
    folder_namee = [i for i in data_root.iterdir() if i.is_dir()]
    files = [j for i in sku_dirs for j in i.glob('*.jpg')]
    data['path'] = [str(i) for i in files]
    data['label'] = [i.parts[-2] for i in files]
    return data

if __name__ == "__main__":
    root = Path('some_path')
    data_root = root / 'dataset'
    df = create_df(data_root)


    for i, row in tqdm(df.iterrows()):
        try:
            img = PIL.Image.open(row.path)
        except Exception:
            print(row.path)
            print(row)
            if os.path.exists(row.path):
                os.remove(row.path)

您可以使用multiprocessing来并行化进程。 例子:

import os
from PIL import Image
from multiprocessing.pool import ThreadPool

IMAGE_EXT = ('.jpg', '.jpeg', '.png', '.gif')


def check_image(image_path):
    try:
        Image.open(image_path)
        print(f'Image is OK: {image_path}')
    except:
        os.remove(image_path)
        print(f'Image deleted: {image_path}')


def delete_broken_images(root_dir):
    pool = ThreadPool(processes=10)
    for subdir, dirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith(IMAGE_EXT):
                image_path = os.path.join(subdir, file)
                pool.apply_async(check_image, (image_path,)).get()


delete_broken_images(r'c:\so\69805310\images')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM