更有效的循环方式？

Question

I have a small piece of code from a much larger script. 我从一个更大的脚本中获得了一小段代码。 I figured out that when the function t_area is called, it is responsible for most of the run time. 我发现当调用函数t_area时，它负责大部分运行时间。 I tested the function by itself, and it is not slow, it takes a lot of time because of the number of times that it has to be ran I believe. 我自己测试了这个功能，它并不慢，它需要花费很多时间，因为它必须运行的次数我相信。 Here is the code where the function is called: 以下是调用函数的代码：

tri_area = np.zeros((numx,numy),dtype=float)
for jj in range(0,numy-1):
    for ii in range(0,numx-1):
      xp = x[ii,jj]
      yp = y[ii,jj]
      zp = surface[ii,jj]
      ap = np.array((xp,yp,zp))

      xp = xp+dx
      zp = surface[ii+1,jj]
      bp = np.array((xp,yp,zp))

      yp = yp+dx
      zp = surface[ii+1,jj+1]
      dp = np.array((xp,yp,zp))

      xp = xp-dx
      zp = surface[ii,jj+1]
      cp = np.array((xp,yp,zp))

      tri_area[ii,jj] = t_area(ap,bp,cp,dp)

The size of the arrays in use here are 216 x 217 , and so are the values of x and y . 这里使用的数组大小为216 x 217 ， x和y的值也是如此。 I am pretty new to python coding, I have used MATLAB in the past. 我对python编码很新，我过去使用过MATLAB。 So my question is, is there a way to get around the two for-loops, or a more efficient way to run through this block of code in general? 所以我的问题是，有没有办法绕过这两个for循环，或者更有效的方式来运行这段代码？ Looking for any help speeding this up! 寻找任何帮助加快这一点！ Thanks! 谢谢！

EDIT: 编辑：

Thanks for the help everyone, this has cleared alot of confusion up. 感谢大家的帮助，这已经清除了很多混乱。 I was asked about the function t_area that is used in the loop, here is the code below: 我被问到循环中使用的函数t_area，下面的代码如下：

def t_area(a,b,c,d):
ab=b-a
ac=c-a
tri_area_a = 0.5*linalg.norm(np.cross(ab,ac))

db=b-d
dc=c-d
tri_area_d = 0.5*linalg.norm(np.cross(db,dc))

ba=a-b
bd=d-b
tri_area_b = 0.5*linalg.norm(np.cross(ba,bd))

ca=a-c
cd=d-c
tri_area_c = 0.5*linalg.norm(np.cross(ca,cd))

av_area = (tri_area_a + tri_area_b + tri_area_c + tri_area_d)*0.5
return(av_area)

Sorry for the confusing notation, at the time it made sense, looking back now I will probably change it. 对于令人困惑的记谱法感到抱歉，当时它有意义，现在回想起来我可能会改变它。 Thanks! 谢谢！

Answer 1

A caveat before we start. 在我们开始之前的一个警告。 range(0, numy-1) , which is equal to range(numy-1) , produces the numbers from 0 to numy-2, not including numy-1. range(0, numy-1) ，等于range(numy-1) ，产生从0到numy-2的数字，不包括numy-1。 That's because you have numy-1 values from 0 to numy-2. 那是因为你有从0到numy-2的numy-1值。 While MATLAB has 1-based indexing, Python has 0-based, so be a bit careful with your indexing in the transition. 虽然MATLAB具有基于1的索引，但Python基于0，因此在转换中对索引进行一些小心。 Considering you have tri_area = np.zeros((numx, numy), dtype=float) , tri_area[ii,jj] never accesses the last row or column with the way you have set up your loops. 考虑到你有tri_area = np.zeros((numx, numy), dtype=float) ， tri_area[ii,jj]永远不会以你设置循环的方式访问最后一行或列。 Therefore, I suspect the correct intention was to write range(numy) . 因此，我怀疑正确的意图是写range(numy) 。

Since the fuction t_area() is vectorisable, you can do away with the loops completely. 由于功能t_area()是可矢量化的，因此您可以完全取消循环。 Vectorisation means numpy applies some operations on a whole array at the same time by taking care of the loops under the hood, where they will be faster. 矢量化意味着numpy通过处理引擎盖下的循环来同时对整个阵列应用一些操作，在那里它们将更快。

First, we stack all the ap s for each (i, j) element in a (m, n, 3) array, where (m, n) is the size of x . 首先，我们在（m，n，3）数组中堆叠每个（i，j）元素的所有ap ，其中（m，n）是x的大小。 If we take the cross product of two (m, n, 3) arrays, the operation will be applied on the last axis by default. 如果我们取两个（m，n，3）数组的叉积，默认情况下操作将应用于最后一个轴。 This means that np.cross(a, b) will do for every element (i, j) take the cross product of the 3 numbers in a[i,j] and b[i,j] . 这意味着np.cross(a, b)将对每个元素（i，j）采用a[i,j]和b[i,j]中的3个数的叉积 。 Similarly, np.linalg.norm(a, axis=2) will do for every element (i, j) calculate the norm of the 3 numbers in a[i,j] . 类似地， np.linalg.norm(a, axis=2)将为每个元素（i，j）计算a[i,j] 3个数的范数 。 This will also effectively reduce our array to size (m, n). 这也将有效地减少我们的数组大小（m，n）。 A bit of caution here though, as we need to explicitly state we want this operation done on the 2nd axis. 这里有点谨慎，因为我们需要明确说明我们希望在第二轴上完成此操作。

Note that in the following example my indexing relationship may not correspond to yours. 请注意，在以下示例中，我的索引关系可能与您的索引关系不对应。 The bare minimum to make this work is for surface to have one extra row and column from x and y . 使这项工作的最低限度是surface从x和y有一个额外的行和列。

import numpy as np

def _t_area(a, b, c):
    ab = b - a
    ac = c - a
    return 0.5 * np.linalg.norm(np.cross(ab, ac), axis=2)

def t_area(x, y, surface, dx):
    a = np.zeros((x.shape[0], y.shape[0], 3), dtype=float)
    b = np.zeros_like(a)
    c = np.zeros_like(a)
    d = np.zeros_like(a)

    a[...,0] = x
    a[...,1] = y
    a[...,2] = surface[:-1,:-1]

    b[...,0] = x + dx
    b[...,1] = y
    b[...,2] = surface[1:,:-1]

    c[...,0] = x
    c[...,1] = y + dx
    c[...,2] = surface[:-1,1:]

    d[...,0] = bp[...,0]
    d[...,1] = cp[...,1]
    d[...,2] = surface[1:,1:]

    # are you sure you didn't mean 0.25???
    return 0.5 * (_t_area(a, b, c) + _t_area(d, b, c) + _t_area(b, a, d) + _t_area(c, a, d))

nx, ny = 250, 250

dx = np.random.random()
x = np.random.random((nx, ny))
y = np.random.random((nx, ny))
surface = np.random.random((nx+1, ny+1))

tri_area = t_area(x, y, surface, dx)

x in this example supports the indices 0-249, while surface 0-250. 此示例中的x支持索引0-249，而surface 0-250。 surface[:-1] , a shorthand for surface[0:-1] , will return all rows starting from 0 and up to the last one, but not including it. surface[:-1] ， surface[0:-1]的简写，将返回从0开始直到最后一行的所有行，但不包括它。 -1 serves the same function and end in MATLAB. -1提供相同的功能并在MATLAB中end 。 So, surface[:-1] will return the rows for indices 0-249. 因此， surface[:-1]将返回索引0-249的行。 Similarly, surface[1:] will return the rows for indices 1-250, which achieves the same as your surface[ii+1] . 类似地， surface[1:]将返回索引1-250的行，这与surface[ii+1] 。

Note : I had written this section before it was known that t_area() could be fully vectorised. 注意：在知道t_area()可以完全矢量化之前，我已经写过这一节。 So while what is here is obsolete for the purposes of this answer, I'll leave it as legacy to show what optimisations could have been made had the function not be vectorisable. 因此，虽然这个答案的目的已经过时了，但我将把它留作遗产来表明如果函数不是可矢量化的话可以进行哪些优化。

Instead of calling the function for each element, which is expensive, you should pass it x , y, , surface and dx and iterate internally. 不要为每个昂贵的元素调用函数，而应该将它传递给x ， y, ， surface和dx并在内部迭代。 That means only one function call and less overhead. 这意味着只有一个函数调用和更少的开销。

Furthermore, you shouldn't create an array for ap , bp , cp and dp every loop, which again, adds overhead. 此外，您不应该为每个循环创建ap ， bp ， cp和dp的数组，这又会增加开销。 Allocate them once outside the loop and just update their values. 一旦在循环外部分配它们，只需更新它们的值。

One final change should be the order of loops. 最后一个改变应该是循环的顺序。 Numpy arrays are row major by default (while MATLAB is column major), so ii performs better as the outer loop. 默认情况下，Numpy数组是行主要的（而MATLAB是列专业），因此ii作为外部循环表现得更好。 You wouldn't notice the difference for arrays of your size, but hey, why not? 您不会注意到您的大小数组的差异，但是，嘿，为什么不呢？

Overall, the modified function should look like this. 总的来说，修改后的功能应如下所示。

def t_area(x, y, surface, dx):
    # I assume numx == x.shape[0]. If not, pass it as an extra argument.
    tri_area = np.zeros(x.shape, dtype=float)

    ap = np.zeros((3,), dtype=float)
    bp = np.zeros_like(ap)
    cp = np.zeros_like(ap)
    dp = np.zeros_like(ap)

    for ii in range(x.shape[0]-1): # do you really want range(numx-1) or just range(numx)?
        for jj in range(x.shape[1]-1):
            xp = x[ii,jj]
            yp = y[ii,jj]
            zp = surface[ii,jj]
            ap[:] = (xp, yp, zp)

            # get `bp`, `cp` and `dp` in a similar manner and compute `tri_area[ii,jj]`

更有效的循环方式？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-01-10 15:29:53

更有效的循环方式？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-01-10 15:29:53

解决方案1
2 已采纳 2016-01-10 15:29:53