简体   繁体   English

纯Python比Numpy更快? 我可以更快地制作这个numpy代码吗?

[英]Pure Python faster than Numpy? can I make this numpy code faster?

I need to compute the min, max, and mean from a specific list of faces/vertices. 我需要从特定的面/顶点列表计算最小值,最大值和平均值。 I tried to optimize this computing with the use of Numpy but without success. 我尝试使用Numpy来优化这种计算,但没有成功。

Here is my test case: 这是我的测试用例:

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
'''
Module Started 22 févr. 2013
@note: test case comparaison numpy vs python
@author: Python4D/damien
'''

import numpy as np
import time


def Fnumpy(vertices):
  np_vertices=np.array(vertices)
  _x=np_vertices[:,:,0]
  _y=np_vertices[:,:,1]
  _z=np_vertices[:,:,2]
  _min=[np.min(_x),np.min(_y),np.min(_z)]
  _max=[np.max(_x),np.max(_y),np.max(_z)]
  _mean=[np.mean(_x),np.mean(_y),np.mean(_z)]
  return _mean,_max,_min

def Fpython(vertices):
  list_x=[item[0] for sublist in vertices for item in sublist]
  list_y=[item[1] for sublist in vertices for item in sublist]
  list_z=[item[2] for sublist in vertices for item in sublist]
  taille=len(list_x)
  _mean=[sum(list_x)/taille,sum(list_y)/taille,sum(list_z)/taille]
  _max=[max(list_x),max(list_y),max(list_z)]
  _min=[min(list_x),min(list_y),min(list_z)]    
  return _mean,_max,_min

if __name__=="__main__":
  vertices=[[[1.1,2.2,3.3,4.4]]*4]*1000000
  _t=time.clock()
  print ">>NUMPY >>{} for {}s.".format(Fnumpy(vertices),time.clock()-_t)
  _t=time.clock()
  print ">>PYTHON>>{} for {}s.".format(Fpython(vertices),time.clock()-_t)

The results are: 结果是:

Numpy: NumPy的:

([1.1000000000452519, 2.2000000000905038, 3.3000000001880174], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998]) for 27.327068618s. ([1.1000000000452519,2.2000000000905038,3.3000000001880174],[1.1000000000000001,2.2000000000000002,3.2999999999999998],[1.1000000000000001,2.2000000000000002,3.2999999999999998]),适用于27.327068618s。

Python: 蟒蛇:

([1.100000000045252, 2.200000000090504, 3.3000000001880174], [1.1, 2.2, 3.3], [1.1, 2.2, 3.3]) for 1.81366938593s. ([1.100000000045252,2.200000000090504,3.3000000001880174],[1.1,2.2,3.3],[ 1.1,2.2,3.3 ])适用于1.81366938593s。

Pure Python is 15x faster than Numpy! 纯Python比Numpy快15倍!

The reason your Fnumpy is slower is that it contains an additional step not done by Fpython : the creation of a numpy array in memory. 你的Fnumpy较慢的原因是它包含一个Fpython没有完成的额外步骤:在内存中创建一个numpy数组。 If you move the line np_verticies=np.array(verticies) outside of Fnumpy and the timed section your results will be very different: 如果在Fnumpy和定时部分之外移动行np_verticies=np.array(verticies) ,结果将会非常不同:

>>NUMPY >>([1.1000000000452519, 2.2000000000905038, 3.3000000001880174], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998]) for 0.500802s.
>>PYTHON>>([1.100000000045252, 2.200000000090504, 3.3000000001880174], [1.1, 2.2, 3.3], [1.1, 2.2, 3.3]) for 2.182239s.

You can also speed up the allocation step significantly by providing a datatype hint to numpy when you create it. 您还可以通过在创建数据类型提示时为numpy提供显着的加速分配步骤。 If you tell Numpy you have an array of floats, then even if you leave the np.array() call in the timing loop it will beat the pure python version. 如果你告诉Numpy你有一个浮点数组,那么即使你在定时循环中留下np.array()调用它也会击败纯python版本。

If I change np_vertices=np.array(vertices) to np_vertices=np.array(vertices, dtype=np.float_) and keep it in Fnumpy , the Fnumpy version will beat Fpython even though it has to do a lot more work: 如果我将np_vertices=np.array(vertices)更改为np_vertices=np.array(vertices, dtype=np.float_)保持FnumpyFnumpy版本将击败Fpython即使它必须做更多的工作:

>>NUMPY >>([1.1000000000452519, 2.2000000000905038, 3.3000000001880174], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998], [1.1000000000000001, 2.2000000000000002, 3.2999999999999998]) for 1.586066s.
>>PYTHON>>([1.100000000045252, 2.200000000090504, 3.3000000001880174], [1.1, 2.2, 3.3], [1.1, 2.2, 3.3]) for 2.196787s.

As already pointed out by others, your problem is the conversion from list to array. 正如其他人已经指出的那样,您的问题是从列表转换为数组。 By using the appropriate numpy functions for that, you will beat Python. 通过使用适当的numpy函数,你将击败Python。 I modified the main part of your program: 我修改了程序的主要部分:

if __name__=="__main__":
  _t = time.clock()
  vertices_np = np.resize(np.array([ 1.1, 2.2, 3.3, 4.4 ], dtype=np.float64), 
                          (1000000, 4, 4))
  print "Creating numpy vertices: {}".format(time.clock() - _t)
  _t = time.clock()
  vertices=[[[1.1,2.2,3.3,4.4]]*4]*1000000
  print "Creating python vertices: {}".format(time.clock() - _t)
  _t=time.clock()
  print ">>NUMPY >>{} for {}s.".format(Fnumpy(vertices_np),time.clock()-_t)
  _t=time.clock()
  print ">>PYTHON>>{} for {}s.".format(Fpython(vertices),time.clock()-_t)

Running your code with the modifed main part results on my machine in: 使用修改后的主要部分运行代码会在我的机器上产生:

Creating numpy vertices: 0.6
Creating python vertices: 0.01
>>NUMPY >>([1.1000000000452519, 2.2000000000905038, 3.3000000001880174], 
[1.1000000000000001, 2.2000000000000002, 3.2999999999999998], [1.1000000000000001, 
2.2000000000000002, 3.2999999999999998]) for 0.5s.
>>PYTHON>>([1.100000000045252, 2.200000000090504, 3.3000000001880174], [1.1, 2.2, 3.3], 
[1.1, 2.2, 3.3]) for 1.91s.

Although the array creation is still somewhat longer with Numpy tools as the creation of the nested lists with python's list multiplication operator (0.6s versus 0.01s), you gain a factor of ca. 尽管使用Numpy工具创建数组时,使用python的列表乘法运算符(0.6s对0.01s)创建嵌套列表,但是你得到的因子大约为ca. 4 for the run-time relevant part of your code. 4代表运行时相关部分。 If I replace the line: 如果我更换线路:

np_vertices=np.array(vertices)

with

np_vertices = np.asarray(vertices)

to avoid the copying of a big array, the running time of the numpy function even goes down to 0.37s on my machine, being more than 5 times faster then the pure python version. 为了避免复制大数组,numpy函数的运行时间甚至在我的机器上降至0.37s,比纯python版本快5倍以上。

In your real code, if you know the number of vertices in advance, you can preallocate the appropriate array via np.empty() , then fill it with the appropriate data, and pass it to the numpy-version of your function. 在您的实际代码中,如果您事先知道顶点的数量,则可以通过np.empty()预先分配相应的数组,然后用适当的数据填充它,并将其传递给函数的numpy版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM