[英]3D distance vectorization
I need help vectorizing this code. 我需要帮助矢量化这段代码。 Right now, with N=100, its takes a minute or so to run.
现在,N = 100,运行需要一分钟左右。 I would like to speed that up.
我想加快速度。 I have done something like this for a double loop, but never with a 3D loop, and I am having difficulties.
我已经做了类似这样的双循环,但从来没有使用3D循环,我遇到了困难。
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)
C = 0
for i in x:
for j in y:
for k in z:
if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
p = np.sqrt(i*i+j*j+k*k)
p = p/r
q = (1/p)**n
C += q
print '\n'
print C
Thanks to @Bill, I was able to get this to work. 感谢@Bill,我能够让它发挥作用。 Very fast now.
现在很快。 Perhaps could be done better, especially with the two masks to get rid of the two conditions that I originally had for loops for.
也许可以做得更好,特别是使用两个掩码来摆脱我最初用于循环的两个条件。
from __future__ import division
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3)
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
ans = (1/p)**n
ans = np.sum(ans)
print 'ans'
print ans
The meshgrid/where/indexing solution is already extremely fast. meshgrid / where / indexing解决方案已经非常快。 I made it about 65 % faster.
我把它提高了大约65%。 This is not too much, but I explain it anyway, step by step:
这不是太多,但我还是一步一步解释:
It was easiest for me to approach this problem with all 3D vectors in the grid being columns in one large 2D 3 x M
array. 对于我来说,最容易解决这个问题,网格中的所有3D矢量都是一个大型2D
3 x M
阵列中的列。 meshgrid
is the right tool for creating all the combinations (note that numpy version >= 1.7 is required for a 3D meshgrid), and vstack
+ reshape
bring the data into the desired form. meshgrid
是用于创建所有组合的正确工具(请注意,3D网格网格需要numpy版本> = 1.7),并且vstack
+ reshape
将数据转换为所需的形式。 Example: 例:
>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1]])
Each column is one 3D vector. 每列是一个3D矢量。 Each of these eight vectors represents one corner of a
1x1x1
cube (a 3D grid with step size 1 and length 1 in all dimensions). 这八个矢量中的每一个代表
1x1x1
立方体的一个角(3D网格,步长为1,长度为1)。
Let's call this array vectors
(it contains all 3D vectors representing all points in the grid). 让我们称这个数组
vectors
(它包含代表网格中所有点的所有3D向量)。 Then, prepare a bool mask for selecting those vectors fulfilling your mod2 criterion: 然后,准备一个bool掩码,用于选择满足mod2标准的向量:
mod2bool = np.sum(vectors, axis=0) % 2 == 0
np.sum(vectors, axis=0)
creates an 1 x M
array containing the element sum for each column vector. np.sum(vectors, axis=0)
创建一个1 x M
数组,其中包含每个列向量的元素和。 Hence, mod2bool
is a 1 x M
array with a bool value for each column vector. 因此,
mod2bool
是1 x M
阵列,每个列向量具有bool值。 Now use this bool mask: 现在使用这个bool面具:
vectorsubset = vectors[:,mod2bool]
This selects all rows (:) and uses boolean indexing for filtering the columns, both are fast operations in numpy. 这将选择所有行(:)并使用布尔索引来过滤列,两者都是numpy中的快速操作。 Calculate the lengths of the remaining vectors, using the native numpy approach:
使用原生numpy方法计算剩余向量的长度:
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
This is quite fast -- however, scipy.stats.ss
and bottleneck.ss
can perform the squared sum calculation even faster than this. 这非常快 - 但是,
scipy.stats.ss
和bottleneck.ss
可以比这更快地执行平方和计算。
Transform the lengths using your instructions: 使用您的说明转换长度:
with np.errstate(divide='ignore'):
p = (r/lengths)**n
This involves finite number division by zero, resulting in Inf
s in the output array. 这涉及有限数除以零,导致输出数组中的
Inf
s。 This is entirely fine. 这完全没问题。 We use numpy's
errstate
context manager for making sure that these zero divisions do not throw an exception or a runtime warning. 我们使用numpy的
errstate
上下文管理器来确保这些零分区不会抛出异常或运行时警告。
Now sum up the finite elements (ignore the infs) and return the sum: 现在总结一下有限元(忽略infs)并返回总和:
return np.sum(p[np.isfinite(p)])
I have implemented this method two times below. 我已经在下面两次实现了这个方法。 Once exactly like just explained, and once involving bottleneck's
ss
and nansum
functions. 曾经完全
nansum
解释过,曾经涉及瓶颈的ss
和nansum
功能。 I have also added your method for comparison, and a modified version of your method that skips the np.where((x*x+y*y+z*z)!=0)
indexing, but rather creates Inf
s, and finally sums up the isfinite
way. 我还添加了你的方法进行比较,你的方法的修改版本跳过了
np.where((x*x+y*y+z*z)!=0)
索引,而是创建了Inf
,最后总结isfinite
方式。
import sys
import numpy as np
import bottleneck as bn
N = 100
n = 12
r = np.sqrt(2)
x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)
def measure_time(func):
import time
def modified_func(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
duration = time.time() - t0
print("%s duration: %.3f s" % (func.__name__, duration))
return result
return modified_func
@measure_time
def method_columnvecs(vectors):
mod2bool = np.sum(vectors, axis=0) % 2 == 0
vectorsubset = vectors[:,mod2bool]
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
@measure_time
def method_columnvecs_opt(vectors):
# On my system, bn.nansum is even slightly faster than np.sum.
mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
# Use ss from bottleneck or scipy.stats (axis=0 is default).
lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return bn.nansum(p[np.isfinite(p)])
@measure_time
def method_original(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
return np.sum((1/p)**n)
@measure_time
def method_original_finitesum(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
lengths = np.sqrt(x*x+y*y+z*z)
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)
This is the output: 这是输出:
$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965
All methods produce the same result. 所有方法都产生相同的结果。 Your method becomes a bit faster when doing the
isfinite
style sum. 在进行
isfinite
式样式求和时,您的方法会变得更快isfinite
。 My methods are faster, but I would say that this is an exercise of academic nature rather than an important improvement :-) 我的方法更快,但我会说这是一种学术性的练习而不是一项重要的改进:-)
I have one question left: you were saying that for N=3, the calculation should produce a 12. Even yours doesn't do this. 我还有一个问题:你说的是,对于N = 3,计算应该产生12.甚至你的不会这样做。 All methods above produce 12.1317530867 for N=3.
对于N = 3,上述所有方法产生12.1317530867。 Is this expected?
这是预期的吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.