如何优化图像比较脚本的性能？

Question

I wrote a script that compares a huge set of images (more than 4500 files) against each other using a root mean square comparison. 我编写了一个脚本，使用均方根比较将一组大量图像（超过4500个文件）相互比较。 At first it resizes each image to 800x600 and takes a histogram. 首先，它将每个图像的大小调整为800x600并采用直方图。 After that it builds an array of combinations and distributes them evenly to four threads which calculate the root mean square of every combination. 之后，它构建一组组合并将它们均匀分布到四个线程，计算每个组合的均方根。 Images with a RMS below 500 will be moved into folders to be manually sorted out later. RMS低于500的图像将被移动到文件夹中，以便稍后手动分拣。

#!/usr/bin/python3

import sys
import os
import math
import operator
import functools
import datetime
import threading
import queue
import itertools
from PIL import Image


def calc_rms(hist1, hist2):
    return math.sqrt(
        functools.reduce(operator.add, map(
            lambda a, b: (a - b) ** 2, hist1, hist2
        )) / len(hist1)
    )


def make_histogram(imgs, path, qout):
    for img in imgs:
        try:
            tmp = Image.open(os.path.join(path, img))
            tmp = tmp.resize((800, 600), Image.ANTIALIAS)
            qout.put([img, tmp.histogram()])
        except Exception:
            print('bad image: ' + img)
    return


def compare_hist(pairs, path):
    for pair in pairs:
        rms = calc_rms(pair[0][1], pair[1][1])
        if rms < 500:
            folder = 'maybe duplicates'
            if rms == 0:
                folder = 'exact duplicates'
            try:
                os.rename(os.path.join(path, pair[0][0]), os.path.join(path, folder, pair[0][0]))
            except Exception:
                pass
            try:
                os.rename(os.path.join(path, pair[1][0]), os.path.join(path, folder, pair[1][0]))
            except Exception:
                pass
    return


def get_time():
    return datetime.datetime.now().strftime("%H:%M:%S")


def chunkify(lst, n):
    return [lst[i::n] for i in range(n)]


def main(path):
    starttime = get_time()
    qout = queue.Queue()
    images = []
    for img in os.listdir(path):
        if os.path.isfile(os.path.join(path, img)):
            images.append(img)
    imglen = len(images)
    print('Resizing ' + str(imglen) + ' Images ' + starttime)
    images = chunkify(images, 4)
    threads = []
    for x in range(4):
        threads.append(threading.Thread(target=make_histogram, args=(images[x], path, qout)))

    [x.start() for x in threads]
    [x.join() for x in threads]

    resizetime = get_time()
    print('Done resizing ' + resizetime)

    histlist = []
    for i in qout.queue:
        histlist.append(i)

    if not os.path.exists(os.path.join(path, 'exact duplicates')):
        os.makedirs(os.path.join(path, 'exact duplicates'))
    if not os.path.exists(os.path.join(path, 'maybe duplicates')):
        os.makedirs(os.path.join(path, 'maybe duplicates'))

    combinations = []
    for img1, img2 in itertools.combinations(histlist, 2):
        combinations.append([img1, img2])

    combicount = len(combinations)
    print('Going through ' + str(combicount) + ' combinations of ' + str(imglen) + ' Images. Please stand by')
    combinations = chunkify(combinations, 4)

    threads = []

    for x in range(4):
        threads.append(threading.Thread(target=compare_hist, args=(combinations[x], path)))

    [x.start() for x in threads]
    [x.join() for x in threads]

    print('\nstarted at ' + starttime)
    print('resizing done at ' + resizetime)
    print('went through ' + str(combicount) + ' combinations of ' + str(imglen) + ' Images')
    print('all done at ' + get_time())

if __name__ == '__main__':
    main(sys.argv[1]) # sys.argv[1] has to be a folder of images to compare

This works but the comparison runs for hours after completing the resizes within 15 to 20 minutes. 这可以工作，但比较在15到20分钟内完成调整后运行了几个小时。 At first I assumed that it was a locking queue from which the workers got their combinations so I replaced it with pre-defined array chunks. 起初我假设它是一个锁定队列，工人从中获取它们的组合，所以我用预定义的数组块替换它。 This did not reduce the execution time. 这并没有减少执行时间。 I also ran it without moving the files to exclude a possible hard drive issue. 我也运行它而不移动文件以排除可能的硬盘驱动器问题。

Profiling this using cProfile provides the following output. 使用cProfile进行分析可提供以下输出。

Resizing 4566 Images 23:51:05
Done resizing 00:05:07
Going through 10421895 combinations of 4566 Images. Please stand by

started at 23:51:05
resizing done at 00:05:07
went through 10421895 combinations of 4566 Images
all done at 03:09:41
         10584539 function calls (10584414 primitive calls) in 11918.945 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     16/1    0.001    0.000 11918.945 11918.945 {built-in method exec}
        1    2.962    2.962 11918.945 11918.945 imcomp.py:3(<module>)
        1   19.530   19.530 11915.876 11915.876 imcomp.py:60(main)
       51 11892.690  233.190 11892.690  233.190 {method 'acquire' of '_thread.lock' objects}
        8    0.000    0.000 11892.507 1486.563 threading.py:1028(join)
        8    0.000    0.000 11892.507 1486.563 threading.py:1066(_wait_for_tstate_lock)
        1    0.000    0.000 11051.467 11051.467 imcomp.py:105(<listcomp>)
        1    0.000    0.000  841.040  841.040 imcomp.py:76(<listcomp>)
 10431210    1.808    0.000    1.808    0.000 {method 'append' of 'list' objects}
     4667    1.382    0.000    1.382    0.000 {built-in method stat}

The full profiler output can be found here . 完整的分析器输出可以在这里找到。

Considering the fourth line I'm guessing that the threads are somehow locking. 考虑到第四行，我猜测线程以某种方式锁定。 But why and why exactly 51 times regardless of the amount of images? 但为什么以及为什么正好51次无论图像数量多少？

I am running this on Windows 7 64 bit. 我在Windows 7 64位上运行它。

Thanks in advance. 提前致谢。

Answer 1

One major issue is that you're using threads to do work that is at least partially CPU-bound. 一个主要问题是您使用线程来完成至少部分受CPU限制的工作。 Because of the Global Interpreter Lock, only one CPython thread can ever run at a time, which means you can't take advantage of multiple CPU cores. 由于Global Interpreter Lock，一次只能运行一个CPython线程，这意味着您无法利用多个CPU核心。 This will make multi-threaded performance for CPU-bound tasks at best no different from single-core execution, and probably even worse, because of the extra overhead added by threading. 这将使CPU绑定任务的多线程性能与单核执行完全没有区别，甚至可能更糟，因为线程增加了额外的开销。 This is noted in the threading documentation : 这在threading文档中注明：

CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). CPython实现细节：在CPython中，由于Global Interpreter Lock，只有一个线程可以同时执行Python代码（即使某些面向性能的库可能会克服此限制）。 If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing . 如果您希望应用程序更好地利用多核机器的计算资源，建议您使用multiprocessing 。 However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously. 但是，如果要同时运行多个I / O绑定任务，则线程仍然是一个合适的模型。

To get around the limitations of the GIL, you should do as the docs say, and use the multiprocessing library instead of the threading library: 为了解决GIL的局限性，你应该按照文档的说法去做，并使用multiprocessing库而不是threading库：

import multiprocessing
...

qout = multiprocessing.Queue()

for x in range(4):
    threads.append(multiprocessing.Process(target=make_histogram, args=(images[x], path, qout)))

...
for x in range(4):
    threads.append(multiprocessing.Process(target=compare_hist, args=(combinations[x], path)))

As you can see, multiprocessing for the most part is a drop-in replacement for threading , so the changes shouldn't be too difficult to make. 正如您所看到的，多数multiprocessing在很大程度上是threading替代，因此更改不应该太难。 The only complication would be if any of the arguments you're passing between processes aren't picklable, though I think all of them are in your case. 唯一的复杂因素是，如果您在进程之间传递的任何参数都不可选，但我认为所有这些参数都在您的情况下。 There is also an increased cost of IPC to send Python data structures between processes, but I suspect the benefit of truly parallel computations will outweigh that additional overhead. 在进程之间发送Python数据结构的IPC成本也在增加，但我怀疑真正并行计算的好处将超过额外的开销。

All that said, you may be still somewhat I/O bound here, because of reliance on reads/writes to disk. 总而言之，由于依赖于对磁盘的读/写，您可能仍然在某种程度上受I / O限制。 Parallelizing won't make your disk I/O faster, so there's not much that can be done there. 并行化不会使您的磁盘I / O更快，因此在那里做的并不多。

Answer 2

With 4500 images to compare, I would suggest multiprocessing on a file-level, not (necessarily) multithreading within the image. 有4500个图像需要比较，我建议在文件级进行多处理，而不是（必然）在图像中进行多线程处理。 As @dano has pointed out, the GIL will get in the way for that. 正如@dano指出的那样，GIL将为此做好准备。 My strategy would be: 我的策略是：

one worker process per core (or configured number); 每个核心一个工作进程（或配置的数字）;
one orchestration process, which forks off the above; 一个编排过程，分解上述内容; does some IPC to coordinate jobs to workers. 做一些IPC来协调工作岗位。

Looking (briefly) at your code looks like it would benefit from a lazy language; 仔细查看您的代码，看起来它会从懒惰的语言中受益; I don't see that makes any attempt to short-circuit comparisons. 我没有看到任何尝试进行短路比较。 For example, if you do the RMS comparison for each segment of an image, you can stop comparing once you end comparing chunks once you determine they are sufficiently different. 例如，如果对图像的每个片段进行RMS比较，则一旦确定它们与众不同，就可以停止比较。 You might then also care to change the way you iterate through the chunks, and the size/shape of the chunks. 然后，您可能还需要更改迭代块的方式以及块的大小/形状。

Apart from that, I would consider looking at cheaper mechanisms that avoid doing some many square roots; 除此之外，我会考虑考虑更便宜的机制，避免做一些平方根; possibly using something that creates an 'approximate' square-root, perhaps using a look-up table. 可能使用创建“近似”平方根的东西，可能使用查找表。

If I'm not mistaken, you could also create an intermediate form (the histogram) that you should keep temporarily. 如果我没弄错的话，你也可以创建一个你应该暂时保留的中间形式（直方图）。 No need to save the 800x600 image. 无需保存800x600图像。

Also, it would be useful to know what you mean be 'equal' with regard to this exercise. 此外，了解您对此练习的“平等”意味着什么也很有用。

如何优化图像比较脚本的性能？

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-09-16 23:51:01

解决方案2
0 2014-09-17 00:36:16

如何优化图像比较脚本的性能？

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-09-16 23:51:01

解决方案2 0 2014-09-17 00:36:16

解决方案1
2 已采纳 2014-09-16 23:51:01

解决方案2
0 2014-09-17 00:36:16