Optimizing a nested for-loop which uses the indices of an array for function

Question

Let's imagine an empty NumPy array of 3x4 where you've got the coordinate of the top-left corner and the step size in horizontal and vertical direction. Now I would like to know the coordinates for the middle of each cell for the whole array. Like this:

在此输入图像描述

For this I implemented a nested for-loop.

In [12]:
import numpy as np
# extent(topleft_x, stepsize_x, 0, topleft_y, 0, stepsize_y (negative since it's top-left)
extent = (5530000.0, 5000.0, 0.0, 807000.0, 0.0, -5000.0)

array = np.zeros([3,4],object) 
cols = array.shape[0]
rows = array.shape[1]

# function to apply to each cell
def f(x,y):
return x*extent[1]+extent[0]+extent[1]/2, y*extent[5]+extent[3]+extent[5]/2

# nested for-loop
def nestloop(cols,rows):
    for col in range(cols):
        for row in range(rows):        
            array[col,row] = f(col,row)             

In [13]: 
%timeit nestloop(cols,rows)
100000 loops, best of 3: 17.4 µs per loop

In [14]: 
array.T
Out[14]:
array([[(5532500.0, 804500.0), (5537500.0, 804500.0), (5542500.0, 804500.0)],
       [(5532500.0, 799500.0), (5537500.0, 799500.0), (5542500.0, 799500.0)],
       [(5532500.0, 794500.0), (5537500.0, 794500.0), (5542500.0, 794500.0)],
       [(5532500.0, 789500.0), (5537500.0, 789500.0), (5542500.0, 789500.0)]], dtype=object)

But eager to learn, how can I optimize this? I was thinking of vectorizing or using lambda. I tried to vectorize it as follow:

array[:,:] = np.vectorize(check)(cols,rows) 
ValueError: could not broadcast input array from shape (2) into shape (3,4)

But, than I got a broadcasting error. Currently the array is 3 by 4, but this also can become 3000 by 4000.

Answer 1

Surely the way you are computing the x and y coordinates is highly inefficient because it's not vectorized at all. You can do:

In [1]: import numpy as np

In [2]: extent = (5530000.0, 5000.0, 0.0, 807000.0, 0.0, -5000.0)
   ...: x_steps = np.array([0,1,2]) * extent[1]
   ...: y_steps = np.array([0,1,2,3]) * extent[-1]
   ...: 

In [3]: x_coords = extent[0] + x_steps + extent[1]/2
   ...: y_coords = extent[3] + y_steps + extent[-1]/2
   ...: 

In [4]: x_coords
Out[4]: array([ 5532500.,  5537500.,  5542500.])

In [5]: y_coords
Out[5]: array([ 804500.,  799500.,  794500.,  789500.])

At this point the coordinates of the points are given by the cartesian product() of these two arrays:

In [5]: list(it.product(x_coords, y_coords))
Out[5]: [(5532500.0, 804500.0), (5532500.0, 799500.0), (5532500.0, 794500.0), (5532500.0, 789500.0), (5537500.0, 804500.0), (5537500.0, 799500.0), (5537500.0, 794500.0), (5537500.0, 789500.0), (5542500.0, 804500.0), (5542500.0, 799500.0), (5542500.0, 794500.0), (5542500.0, 789500.0)]

You just have to group them 4 by 4.

To obtain the product with numpy you can do (based on this answer):

In [6]: np.transpose([np.tile(x_coords, len(y_coords)), np.repeat(y_coords, len(x_coords))])
Out[6]: 
array([[ 5532500.,   804500.],
       [ 5537500.,   804500.],
       [ 5542500.,   804500.],
       [ 5532500.,   799500.],
       [ 5537500.,   799500.],
       [ 5542500.,   799500.],
       [ 5532500.,   794500.],
       [ 5537500.,   794500.],
       [ 5542500.,   794500.],
       [ 5532500.,   789500.],
       [ 5537500.,   789500.],
       [ 5542500.,   789500.]])

Which can be reshaped:

In [8]: product.reshape((3,4,2))   # product is the result of the above
Out[8]: 
array([[[ 5532500.,   804500.],
        [ 5537500.,   804500.],
        [ 5542500.,   804500.],
        [ 5532500.,   799500.]],

       [[ 5537500.,   799500.],
        [ 5542500.,   799500.],
        [ 5532500.,   794500.],
        [ 5537500.,   794500.]],

       [[ 5542500.,   794500.],
        [ 5532500.,   789500.],
        [ 5537500.,   789500.],
        [ 5542500.,   789500.]]])

If this is not the order you want you can do something like:

In [9]: ar = np.zeros((3,4,2), float)
    ...: ar[0] = product[::3]
    ...: ar[1] = product[1::3]
    ...: ar[2] = product[2::3]
    ...: 

In [10]: ar
Out[10]: 
array([[[ 5532500.,   804500.],
        [ 5532500.,   799500.],
        [ 5532500.,   794500.],
        [ 5532500.,   789500.]],

       [[ 5537500.,   804500.],
        [ 5537500.,   799500.],
        [ 5537500.,   794500.],
        [ 5537500.,   789500.]],

       [[ 5542500.,   804500.],
        [ 5542500.,   799500.],
        [ 5542500.,   794500.],
        [ 5542500.,   789500.]]])

I believe there are better ways to do this last reshaping, but I'm not a numpy expert.

Note that using object as dtype it's a huge performance penalty, since numpy cannot optimize anything (and is sometimes slower than using normal list s). I have used a (3,4,2) array instead which allows faster operations.

Optimizing a nested for-loop which uses the indices of an array for function

Question

1 answers

solution1
3 ACCPTED 2013-12-03 07:47:36

Optimizing a nested for-loop which uses the indices of an array for function

Question

1 answers

solution1 3 ACCPTED 2013-12-03 07:47:36

solution1
3 ACCPTED 2013-12-03 07:47:36