简体   繁体   English

3D距离矢量化

[英]3D distance vectorization

I need help vectorizing this code. 我需要帮助矢量化这段代码。 Right now, with N=100, its takes a minute or so to run. 现在,N = 100,运行需要一分钟左右。 I would like to speed that up. 我想加快速度。 I have done something like this for a double loop, but never with a 3D loop, and I am having difficulties. 我已经做了类似这样的双循环,但从来没有使用3D循环,我遇到了困难。

import numpy as np
N = 100
n = 12
r = np.sqrt(2)

x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)

C = 0

for i in x:
    for j in y:
        for k in z:
            if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
                p = np.sqrt(i*i+j*j+k*k)
                p = p/r
                q = (1/p)**n
                C += q

print '\n'
print C

Thanks to @Bill, I was able to get this to work. 感谢@Bill,我能够让它发挥作用。 Very fast now. 现在很快。 Perhaps could be done better, especially with the two masks to get rid of the two conditions that I originally had for loops for. 也许可以做得更好,特别是使用两个掩码来摆脱我最初用于循环的两个条件。

    from __future__ import division
    import numpy as np

    N = 100
    n = 12
    r = np.sqrt(2)

    x, y, z = np.meshgrid(*[np.arange(-N, N+1)]*3)

    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    ind = np.where((x*x+y*y+z*z)!=0)
    x = x[ind]
    y = y[ind]
    z = z[ind]

    p=np.sqrt(x*x+y*y+z*z)/r

    ans = (1/p)**n
    ans = np.sum(ans)
    print 'ans'
    print ans

The meshgrid/where/indexing solution is already extremely fast. meshgrid / where / indexing解决方案已经非常快。 I made it about 65 % faster. 我把它提高了大约65%。 This is not too much, but I explain it anyway, step by step: 这不是太多,但我还是一步一步解释:

It was easiest for me to approach this problem with all 3D vectors in the grid being columns in one large 2D 3 x M array. 对于我来说,最容易解决这个问题,网格中的所有3D矢量都是一个大型2D 3 x M阵列中的列。 meshgrid is the right tool for creating all the combinations (note that numpy version >= 1.7 is required for a 3D meshgrid), and vstack + reshape bring the data into the desired form. meshgrid是用于创建所有组合的正确工具(请注意,3D网格网格需要numpy版本> = 1.7),并且vstack + reshape将数据转换为所需的形式。 Example: 例:

>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 0, 1, 1, 1, 1],
       [0, 1, 0, 1, 0, 1, 0, 1]])

Each column is one 3D vector. 每列是一个3D矢量。 Each of these eight vectors represents one corner of a 1x1x1 cube (a 3D grid with step size 1 and length 1 in all dimensions). 这八个矢量中的每一个代表1x1x1立方体的一个角(3D网格,步长为1,长度为1)。

Let's call this array vectors (it contains all 3D vectors representing all points in the grid). 让我们称这个数组vectors (它包含代表网格中所有点的所有3D向量)。 Then, prepare a bool mask for selecting those vectors fulfilling your mod2 criterion: 然后,准备一个bool掩码,用于选择满足mod2标准的向量:

    mod2bool = np.sum(vectors, axis=0) % 2 == 0

np.sum(vectors, axis=0) creates an 1 x M array containing the element sum for each column vector. np.sum(vectors, axis=0)创建一个1 x M数组,其中包含每个列向量的元素和。 Hence, mod2bool is a 1 x M array with a bool value for each column vector. 因此, mod2bool1 x M阵列,每个列向量具有bool值。 Now use this bool mask: 现在使用这个bool面具:

    vectorsubset = vectors[:,mod2bool]

This selects all rows (:) and uses boolean indexing for filtering the columns, both are fast operations in numpy. 这将选择所有行(:)并使用布尔索引来过滤列,两者都是numpy中的快速操作。 Calculate the lengths of the remaining vectors, using the native numpy approach: 使用原生numpy方法计算剩余向量的长度:

    lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))

This is quite fast -- however, scipy.stats.ss and bottleneck.ss can perform the squared sum calculation even faster than this. 这非常快 - 但是, scipy.stats.ssbottleneck.ss可以比这更快地执行平方和计算。

Transform the lengths using your instructions: 使用您的说明转换长度:

    with np.errstate(divide='ignore'):
        p = (r/lengths)**n

This involves finite number division by zero, resulting in Inf s in the output array. 这涉及有限数除以零,导致输出数组中的Inf s。 This is entirely fine. 这完全没问题。 We use numpy's errstate context manager for making sure that these zero divisions do not throw an exception or a runtime warning. 我们使用numpy的errstate上下文管理器来确保这些零分区不会抛出异常或运行时警告。

Now sum up the finite elements (ignore the infs) and return the sum: 现在总结一下有限元(忽略infs)并返回总和:

    return  np.sum(p[np.isfinite(p)])

I have implemented this method two times below. 我已经在下面两次实现了这个方法。 Once exactly like just explained, and once involving bottleneck's ss and nansum functions. 曾经完全nansum解释过,曾经涉及瓶颈的ssnansum功能。 I have also added your method for comparison, and a modified version of your method that skips the np.where((x*x+y*y+z*z)!=0) indexing, but rather creates Inf s, and finally sums up the isfinite way. 我还添加了你的方法进行比较,你的方法的修改版本跳过了np.where((x*x+y*y+z*z)!=0)索引,而是创建了Inf ,最后总结isfinite方式。

import sys
import numpy as np
import bottleneck as bn

N = 100
n = 12
r = np.sqrt(2)


x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)


def measure_time(func):
    import time
    def modified_func(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - t0
        print("%s duration: %.3f s" % (func.__name__, duration))
        return result
    return modified_func


@measure_time
def method_columnvecs(vectors):
    mod2bool = np.sum(vectors, axis=0) % 2 == 0
    vectorsubset = vectors[:,mod2bool]
    lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  np.sum(p[np.isfinite(p)])


@measure_time
def method_columnvecs_opt(vectors):
    # On my system, bn.nansum is even slightly faster than np.sum.
    mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
    # Use ss from bottleneck or scipy.stats (axis=0 is default).
    lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  bn.nansum(p[np.isfinite(p)])


@measure_time
def method_original(x,y,z):
    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    ind = np.where((x*x+y*y+z*z)!=0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    p=np.sqrt(x*x+y*y+z*z)/r
    return np.sum((1/p)**n)


@measure_time
def method_original_finitesum(x,y,z):
    ind = np.where((x+y+z)%2==0)
    x = x[ind]
    y = y[ind]
    z = z[ind]
    lengths = np.sqrt(x*x+y*y+z*z)
    with np.errstate(divide='ignore'):
        p = (r/lengths)**n
    return  np.sum(p[np.isfinite(p)])


print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)

This is the output: 这是输出:

$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965

All methods produce the same result. 所有方法都产生相同的结果。 Your method becomes a bit faster when doing the isfinite style sum. 在进行isfinite式样式求和时,您的方法会变得更快isfinite My methods are faster, but I would say that this is an exercise of academic nature rather than an important improvement :-) 我的方法更快,但我会说这是一种学术性的练习而不是一项重要的改进:-)

I have one question left: you were saying that for N=3, the calculation should produce a 12. Even yours doesn't do this. 我还有一个问题:你说的是,对于N = 3,计算应该产生12.甚至你的不会这样做。 All methods above produce 12.1317530867 for N=3. 对于N = 3,上述所有方法产生12.1317530867。 Is this expected? 这是预期的吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM