简体   繁体   English

在 numpy 数组中快速替换

[英]Fast replace in numpy array

I have been trying to implement some modification to speed up this pseudo code:我一直在尝试实现一些修改来加速这个伪代码:

>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B=np.array([np.power(A,n) for n in [3,4,5]])
>>> B
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

Where elements of A are often repeated 10-20 times and the shape of B needs to be retained because it is multiplied by another array of the same shape later.其中 A 的元素经常重复 10-20 次,而 B 的形状需要保留,因为它稍后会与另一个相同形状的数组相乘。

My first idea was to use the following code:我的第一个想法是使用以下代码:

uA=np.unique(A)
uB=np.array([np.power(uA,n) for n in [3,4,5]])
B=[]
for num in range(uB.shape[0]):
    Temp=np.copy(A)
    for k,v in zip(uA,uB[num]): Temp[A==k] = v
    B.append(Temp)
B=np.array(B)
### Also any better way to create the numpy array B?

This seems fairly terrible and there is likely a better way.这看起来相当可怕,可能有更好的方法。 Any idea on how to speed this up would be much appreciated.任何有关如何加快速度的想法将不胜感激。

Thank you for your time.感谢您的时间。

Here is an update.这是一个更新。 I realized that my function was poorly coded.我意识到我的函数编码很差。 A thank you to everyone for the suggestions.感谢大家的建议。 I will try to rephrase my questions better in the future so that they show everything required.将来我将尝试更好地重新表述我的问题,以便它们显示所需的一切。

Normal='''
import numpy as np
import scipy
def func(value,n):
    if n==0: return 1
    else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1)
A=np.random.randint(10,size=250)
A=np.unique(A)
B=np.array([func(A,n) for n in [6,8,10]])
'''

Me='''
import numpy as np
import scipy
def func(value,n):
    if n==0: return 1
    else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1)
A=np.random.randint(10,size=250)
uA=np.unique(A)
uB=np.array([func(A,n) for n in [6,8,10]])
B=[]
for num in range(uB.shape[0]):
    Temp=np.copy(A)
    for k,v in zip(uA,uB[num]): Temp[A==k] = v
    B.append(Temp)
B=np.array(B)
'''


Alex='''
import numpy as np
import scipy
A=np.random.randint(10,size=250)
power=np.arange(11)
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B=np.vstack((six,eight,ten))
'''
Alex='''
import numpy as np
import scipy
A=np.random.randint(10,size=250)
power=np.arange(11)
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B=np.vstack((six,eight,ten))
'''

Alex2='''
import numpy as np
import scipy
def find_count(the_list):
    count = list(the_list).count
    result = [count(item) for item in set(the_list)]
    return result
A=np.random.randint(10,size=250)
A_unique=np.unique(A)
A_counts = np.array(find_count(A_unique))
fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1)
power=np.power(A_unique,np.arange(11).reshape(-1,1))
value=power/fact
six=np.sum(value[:6],axis=0)
eight=six+np.sum(value[6:8],axis=0)
ten=eight+np.sum(value[8:],axis=0)
B_nodup=np.vstack((six,eight,ten))
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A_unique.shape[0]) ]
B = np.hstack( B_list )
'''


print timeit.timeit(Normal, number=10000)
print timeit.timeit(Me, number=10000)
print timeit.timeit(Alex, number=10000)
print timeit.timeit(Alex2, number=10000)

Normal: 10.7544178963
Me:     23.2039361
Alex:    4.85648703575
Alex2:   4.18024992943

You can broadcast np.power across A if you change its shape to that of a column vector.如果将A的形状更改为列向量的形状,则可以在A广播np.power

>>> np.power(A.reshape(-1,1), [3,4,5]).T
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

Use a combination of numpy.tile() and numpy.hstack(), as follows:使用 numpy.tile() 和 numpy.hstack() 的组合,如下:

A = np.array([1,2,3])
A_counts = np.array([3,3,3])
A_powers = np.array([[3],[4],[5]])
B_nodup = np.power(A, A_powers)
B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ]
B = np.hstack( B_list )

The transpose and stack may be reversed, this may be faster:转置和堆栈可能会颠倒,这可能会更快:

B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ]
B = np.transpose( np.vstack( B_list ) )

This is likely only worth doing if the function you are calculating is quite expensive, or it is duplicated many, many times (more than 10);如果您正在计算的函数非常昂贵,或者它被复制了很多很多次(超过 10 次),这可能只值得做; doing a tile and stack to prevent calculating the power function an extra 10 times is likely not worth it.做一个瓷砖和堆栈来防止计算额外的10次幂函数可能不值得。 Please benchmark and let us know.请进行基准测试并让我们知道。

EDIT: Or, you could just use broadcasting to get rid of the list comprehension:编辑:或者,您可以使用广播来摆脱列表理解:

>>> A=np.array([1,1,1,2,2,2,3,3,3])
>>> B = np.power(A,[[3],[4],[5]])
>>> B
array([[  1,   1,   1,   8,   8,   8,  27,  27,  27],
       [  1,   1,   1,  16,  16,  16,  81,  81,  81],
       [  1,   1,   1,  32,  32,  32, 243, 243, 243]])

This is probably pretty fast, but doesn't actually do what you asked.这可能很快,但实际上并没有按照您的要求进行。

My go at it with 200k iterations, the first method is mine.我进行了 20 万次迭代,第一种方法是我的。

import numpy as np
import time

N = 200000
start = time.time()
for j in range(N):

    x = np.array([1,1,1,2,2,2,3,3,3])
    powers = np.array([3,4,5])
    result = np.zeros((powers.size,x.size)).astype(np.int32)
    for i in range(powers.size):
        result[i,:] = x**powers[i]
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    A=np.array([1,1,1,2,2,2,3,3,3])
    B = np.power(A,[[3],[4],[5]])
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    np.power(A.reshape(-1,1), [3,4,5]).T
print time.time()-start, "seconds"

start = time.time()
for j in range(N):
    A=np.array([1,1,1,2,2,2,3,3,3])
    B=np.array([np.power(x,n) for n in [3,4,5]])
print time.time()-start, "seconds"

Produces生产

8.88000011444 seconds
9.25099992752 seconds
3.95399999619 seconds
7.43799996376 seconds

larsmans method is clearly fastest. larsmans 方法显然是最快的。

(ps how do you link to an answer or user here without explicit url @larsman doesnt work) (ps你如何在没有明确网址的情况下链接到答案或用户@larsman不起作用)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM