Let's assume I have some NumPy arrays a
and b
where a.shape
is (N, 5000)
and b.shape
is (N, 2500)
. N
is some number of samples which may vary based on my problem/algorithm - but its always the same for a
and b
.
Now I want another array c
of shape (N, 7500)
which holds a
's values in [0:5000]
and b
's values in [5000:7500]
.
Currently I am creating a zero-filled buffer array and slice the values into it:
# ...retrieving a
# ...retrieving b
c = zeros.((N, 7500)).astype(np.float32)
# insert values of a
c[:, 0:5000] = a
# insert values of b
c[:, 5000:7500] = b
# free up memory
del a, b
Is this a fast / efficient, (and therefore "pythonic" / "numpy'ish") way to do so? Or do better solutions in terms of space/memory consumption or computing time exist?
a
and b
are loaded from somewhere else and preprocessed, so it is no option to somehow directly insert the data into a buffer c
without creating a
and b
.
对于这些尺寸,使用hstack是合理的。
c = np.hstack([a,b])
would do what you want. See also np.concatenate
Timeit results
a = np.ones((1000,5000), dtype=np.float64)
b = np.ones((1000,2500), dtype=np.float64)
%timeit c = np.concatenate([a,b], axis=1)
1000 loops, best of 3: 66.4 ms per loop
%timeit c = np.hstack([a,b])
1000 loops, best of 3: 67.3 ms per loop
# Check that it is really the same:
np.testing.assert_array_equal(np.concatenate([a,b], axis=1), np.hstack([a,b]))
So concatenate is probably a bit faster because hstack
is just a wrapper (unnecessary function call) around concatenate
As reference
%%timeit
c = np.zeros((1000, 7500), dtype=np.float64)
# insert values of a
c[:, 0:5000] = a
# insert values of b
c[:, 5000:7500] = b
1000 loops, best of 3: 69.7 ms per loop
seems almost as fast as concatenate
. But that's only because the first axis was 1000
. If I change the first axis to only contain 10
elements the timings are completly different:
a = np.ones((10,5000), dtype=np.float64)
b = np.ones((10,2500), dtype=np.float64)
# concatenate
1000 loops, best of 3: 349 µs per loop
# hstack
1000 loops, best of 3: 406 µs per loop
# your approach
1000 loops, best of 3: 452 µs per loop
hstack
is considerably faster:
a = np.ones((5, 2500)).astype(np.float32)
b = np.zeros((5, 5000)).astype(np.float32)
n = 5
%%timeit
c = np.zeros((n, 7500)).astype(np.float32)
c[:, :2500] = a
c[:, 2500:] = b
10000 loops, best of 3: 70 µs per loop
%timeit c = np.hstack((a, b))
10000 loops, best of 3: 27 µs per loop
If you use small arrays, hstack
is a bit slower than the other solution. In terms of memory usage both approaches should be similar.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.