Numpy - efficiently building 2d array from 1d arrays

Question

I am building 2d array from 1d arrays in numpy (Python 2.7). I am looking for most efficient way to do it. Up to date I came up with:

a=np.ones(100000)

#   SUBSCRIPTING

n_dim=3
x=0

for i in xrange(0,1000):
    x=np.zeros(shape=(100000,n_dim))
    for j in xrange(0,n_dim):
        x[:,j]=a*j

# ~1.574 s/1000 loops - joinind 3 1d arrays
# ~9.162 s/1000 loops - joinind 10 1d arrays



#   STACKING

for i in xrange(0,1000):
    x=a*0.
    for j in xrange(1,n_dim):
        x=np.vstack((x,a*j))
    x=x.T

# ~1.786 s/1000 loops - joinind 3 1d arrays
# ~16.603 s/1000 loops - joinind 10 1d arrays

First method (subscripting) is the fastest I came up with and performance gain over the second method (stacking) grows with number of 1d arrays I am joining. As I need to repeat this step quite a bit, I wonder if there is something faster? I am willing to go with solution that loses in clarity if it offers significant performance boost.

Maybe I can try stacking arrays in a way that limits number of stacking operations (eg case of joining 4 1d arrays: first stack arrays 1 and 2, then arrays 3 and 4, and resulting arrays at the end).

My question is about efficiently building 2d array from 1d arrays. Values in the arrays I am using here are dummy. In the real application most of the values in 1d arrays I am joining will likely differ.

Answer 1

Because numpy stores (by default) arrays in row-major order it is more efficient to set the values by rows. Therefore, I would use:

x=np.zeros(shape=(n_dim, 100000))
for j in range(0,n_dim):
    x[j,:]=a*j

Alternatively, you can define x to be column-major, and then, this is as fast as the previous code:

x=np.zeros(shape=(100000,n_dim), order='F')
for j in range(0,n_dim):
    x[:,j]=a*j

You could also create x with the numpy outer product:

v = np.arange(n_dim)
x = np.outer(v, a)

Answer 2

This is poor way of using vstack ; you are calling it repeatedly, creating a new x for each j

x=a*0.
for j in xrange(1,n_dim):
    x=np.vstack((x,a*j))
x=x.T

The correct way is to build a list of the arrays, and use vstack only once.

xlist=[]
for j in xrange(1,n_dim):
    xlist.append(a*j)
x = np.array(xlist).T

In this context append works just as well as vstack , and may be faster. There is also a column_stack function. The key difference is that I am taking advantage of the fast list append, and the ability of array (and vstack ) to take many items in its argument list.

It's even better if you can write the loop as a list comprehension

x = np.array([a*j for j in xrange(1,n_dim)])

Insertion in a preallocated array is often the fastest choice. But you should be familiar with this build-from-a-list method.

The basic np.array expression

np.array([[1,2,3],[4,5,6]])

is just this, building 2d from a list of 1d arrays (or in this case lists).

np.array([a*0,a*1,a*2])

jakub noted that np.array is slow. For n_dim=10 :

In [257]: timeit x=np.array([(a*j) for j in range(n_dim)]).T
1 loops, best of 3: 228 ms per loop

In [258]: timeit x=np.array([(a*j).tolist() for j in range(n_dim)]).T
1 loops, best of 3: 228 ms per loop

Apparently np.array is converting the input arrays to lists, and then doing its usual construction from nested lists (or something equivalent).

In [259]: timeit x=np.vstack([(a*j) for j in range(n_dim)]).T
10 loops, best of 3: 24.9 ms per loop

vstack on the list of arrays is considerably faster. Faster than the iterative vstack (which I expected). And basically the same as Ramon's row insertion (and insertion into order='F' )

In [272]: %%timeit
x=np.zeros((n_dim,a.shape[0]))
for j in range(n_dim):
   x[j,:]=a*j
   .....: x=x.T
   .....: 
10 loops, best of 3: 23.3 ms per loop

While concatenate (used by vstack ) is compiled, I suspect it does something similar to the iterative insertion. It is common in the source C code to create an empty target array, and then fill it with the appropriate values.

Numpy - efficiently building 2d array from 1d arrays

Question

2 answers

solution1
2 2015-05-04 15:18:58

solution2
1 2015-05-04 16:21:31

Numpy - efficiently building 2d array from 1d arrays

Question

2 answers

solution1 2 2015-05-04 15:18:58

solution2 1 2015-05-04 16:21:31

solution1
2 2015-05-04 15:18:58

solution2
1 2015-05-04 16:21:31