Concatenate big numpy arrays

Question

Let's assume I have some NumPy arrays a and b where a.shape is (N, 5000) and b.shape is (N, 2500) . N is some number of samples which may vary based on my problem/algorithm - but its always the same for a and b .

Now I want another array c of shape (N, 7500) which holds a 's values in [0:5000] and b 's values in [5000:7500] .

Currently I am creating a zero-filled buffer array and slice the values into it:

# ...retrieving a
# ...retrieving b
c = zeros.((N, 7500)).astype(np.float32)

# insert values of a
c[:, 0:5000] = a

# insert values of b
c[:, 5000:7500] = b

# free up memory
del a, b

Is this a fast / efficient, (and therefore "pythonic" / "numpy'ish") way to do so? Or do better solutions in terms of space/memory consumption or computing time exist?

a and b are loaded from somewhere else and preprocessed, so it is no option to somehow directly insert the data into a buffer c without creating a and b .

Answer 1

对于这些尺寸，使用hstack是合理的。

Answer 2

c = np.hstack([a,b]) would do what you want. See also np.concatenate

Timeit results

a = np.ones((1000,5000), dtype=np.float64)
b = np.ones((1000,2500), dtype=np.float64)

%timeit c = np.concatenate([a,b], axis=1)
1000 loops, best of 3: 66.4 ms per loop

%timeit c = np.hstack([a,b])
1000 loops, best of 3: 67.3 ms per loop

# Check that it is really the same:
np.testing.assert_array_equal(np.concatenate([a,b], axis=1), np.hstack([a,b]))

So concatenate is probably a bit faster because hstack is just a wrapper (unnecessary function call) around concatenate

As reference

%%timeit
c = np.zeros((1000, 7500), dtype=np.float64)

# insert values of a
c[:, 0:5000] = a

# insert values of b
c[:, 5000:7500] = b

1000 loops, best of 3: 69.7 ms per loop

seems almost as fast as concatenate . But that's only because the first axis was 1000 . If I change the first axis to only contain 10 elements the timings are completly different:

a = np.ones((10,5000), dtype=np.float64)
b = np.ones((10,2500), dtype=np.float64)

# concatenate
1000 loops, best of 3: 349 µs per loop
# hstack
1000 loops, best of 3: 406 µs per loop
# your approach
1000 loops, best of 3: 452 µs per loop

Answer 3

hstack is considerably faster:

a = np.ones((5, 2500)).astype(np.float32)
b = np.zeros((5, 5000)).astype(np.float32)
n = 5

%%timeit
c = np.zeros((n, 7500)).astype(np.float32)
c[:, :2500] = a
c[:, 2500:] = b
10000 loops, best of 3: 70 µs per loop

%timeit c = np.hstack((a, b))
10000 loops, best of 3: 27 µs per loop

If you use small arrays, hstack is a bit slower than the other solution. In terms of memory usage both approaches should be similar.

Concatenate big numpy arrays

Question

3 answers

solution1
3 2016-02-10 14:28:47

solution2
3 ACCPTED 2016-02-10 14:29:07

solution3
3 2016-02-10 14:49:31

Concatenate big numpy arrays

Question

3 answers

solution1 3 2016-02-10 14:28:47

solution2 3 ACCPTED 2016-02-10 14:29:07

solution3 3 2016-02-10 14:49:31

solution1
3 2016-02-10 14:28:47

solution2
3 ACCPTED 2016-02-10 14:29:07

solution3
3 2016-02-10 14:49:31