简体   繁体   English

Python2在简单的数学运算中表现不佳

[英]Python2 poor performance in simple math operations

I am trying to do a small variation of euclidean distance (I am fully aware that that is not euclidean formula) for color quantization from RGB values to a 16 color palette.我正在尝试对从 RGB 值到 16 色调色板的颜色量化进行欧几里德距离的小变化(我完全意识到这不是欧几里德公式)。 I have no problems with code, but performance in python is over 25 times slower than Java.我的代码没有问题,但 python 的性能比 Java 慢 25 倍以上。

Main method in python looks like this: python中的主要方法如下所示:

def getBaseColor(rValue=128, gValue=128, bValue=128):
 allDistances=[450]*16
 for x in range(0,16):
  valoresColor = coloresWeb[x]
  allDistances[x]= (abs(valoresColor[0]-rValue) + abs(valoresColor[1]-gValue) + abs(valoresColor[2]-bValue))
 return allDistances.index(min(allDistances))

I did small benchmark tests (1M operations) and Java is 25 times faster than Python (2.7.9).我做了小型基准测试(1M 次操作),Java 比 Python(2.7.9)快 25 倍。 Using pypy helps a lot, but still very far from Java.使用 pypy 有很大帮助,但离 Java 还很远。

Python 2: ~5.2s Python 2:~5.2s

Java: ~0.2s Java:~0.2s

pypy: ~ 0.6s pypy:~0.6s

My question is: Am I doing something wrong with python, or it is just super slow by nature?.我的问题是:我是不是用 python 做错了什么,或者它本质上是超级慢的?。 This is a process that needs to be run hundred of million of times, and no, this is not image processing (although it looks like).这是一个需要运行亿次的过程,不,这不是图像处理(虽然看起来像)。

Fully functional minimal codes are provided here for Python and Java. 这里为 Python 和 Java 提供功能齐全的最小代码。

With NumPy, calculating all million points at the same time: 使用NumPy,可以同时计算所有百万分:

import time
import numpy as np

webColours = np.array([
    [0,0,0],
    [0,0,128],
    [0,128,0],
    [0,128,128],
    [128,0,0],
    [128,0,128],
    [128,128,0],
    [192,192,192],
    [128,128,128],
    [0,0,255],
    [0,255,0],
    [0,255,255],
    [255,0,0],
    [255,0,255],
    [255,255,0],
    [255,255,255]
])

def getBaseColours(colours):
    # colours is 1000000x3
    # set up a distances array (16x1000000)
    distances = np.zeros((16, np.size(colours, 0)))
    for colour in xrange(16):
        # calculate distance of each input colour to this webColour
        distances[colour] = np.sum(abs(colours - webColours[colour]), 1)
    # which of 16 distances is the least for each of 1000000 colours
    return np.argmin(distances, 0)

startTime = time.time()
colour = np.array([134,234,43])
colours = np.tile(colour, (1000000, 1))
getBaseColours(colours)
print "Time: " + str(time.time()-startTime)

Time: 0.9s on my system (where your Python code executes in 9s). 时间:在我的系统上为0.9秒(您的Python代码在9秒内执行)。 Also, I'm a newbie at NumPy, so possibly the code could be optimised even further. 另外,我是NumPy的新手,因此可能可以进一步优化代码。

Since you only want to find a nearest neighbor for color quantization, you don't actually need to to calculate all the distances the way you are doing. 由于您只想找到一个最近的邻居进行色彩量化,因此实际上您不需要按照自己的方式计算所有距离。 In particular, using a KDTree in this case would be much more efficient. 特别是在这种情况下使用KDTree会更加有效。

Otherwise, as others have noted you get a slow results for Python because such operations would not normally be performed in pure Python. 否则,正如其他人指出的那样,使用Python的结果会很慢,因为此类操作通常不会在纯Python中执行。 The default approach would be to use Numpy, and in this case this can also be speed up using a specialized function from Scipy (see scipy.spatial.distance or better in this case scipy.spatial.cKDTree ). 默认方法是使用Numpy,在这种情况下,也可以使用Scipy的专用功能(请参阅scipy.spatial.distance或在这种情况下为scipy.spatial.cKDTree更好)来加快速度。 Finally if that is still not good enough, you can use Cython, Pypy, etc. 最后,如果仍然不够好,可以使用Cython,Pypy等。

Plain CPython is slow by nature - it comes from the very design of the interpreter. 普通的CPython本质上很慢-它来自解释器的设计。 Speaking simplified, CPython is a C++ program that constantly reads your instructions from a file, parses them and acts accordingly. 简而言之,CPython是一个C ++程序,它不断从文件中读取指令,对其进行解析并采取相应的行动。

So for every instruction, you have a full "context switch" from your code down to its representations in C++, including all name lookups, transformations for wrappers, then the actual computation and then back to your code again. 因此,对于每条指令,您都有完整的“上下文切换”,从代码一直到其在C ++中的表示形式,包括所有名称查找,包装程序的转换, 然后是实际的计算,然后再次返回到您的代码。 Especially loops are costly, because it means you are doing the same again and again. 循环尤其昂贵,因为循环意味着您一次又一次地执行相同的操作。 Since CPython is acting per-line, it cannot do any optimization, such as prefetching data, vectorizing etc. 由于CPython按行运行,因此它无法进行任何优化,例如预取数据,向量化等。

The upside is that you can do powerful introspection and self-modification, with a very simple implementation. 好处是,您可以通过非常简单的实现进行强大的自省和自我修改。 The downside is that the interpreter has to go all the way at every step. 不利的一面是,口译员必须在每一步都走一路。

In contrast, both Java and PyPy are just-in-time compiled. 相反,Java和PyPy都是即时编译的。 When they go through a loop, they will realize that they did the same thing already (instruction wise) and be prepared for it. 当他们经历一个循环时,他们将意识到他们已经做过同样的事情(在指令方面)并为此做好了准备。 This is why PyPy may be slower than CPython sometimes: it needs a warm-up phase in which it can actually optimize repeated operations. 这就是为什么PyPy有时可能比CPython慢​​的原因:它需要一个预热阶段,在该阶段中它实际上可以优化重复的操作。 If operations are repeated only a bit or never, there is no advantage. 如果仅重复操作一次或从不重复操作,则没有优势。


Disclaimer: This is a very simplified view of the CPython interpreter. 免责声明:这是CPython解释器的非常简化的视图。 For example, there are some "short-circuit" instructions such as list comprehensions which are handled more efficiently than regular loops. 例如,有些“短路”指令(例如列表理解)比常规循环更有效地处理。 As these can still call arbitrary code, they are limited in performance as well, however. 由于它们仍然可以调用任意代码,因此它们的性能也受到限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM