简体   繁体   English

如何将 numpy 数组列表转换为单个 numpy 数组?

[英]How to convert list of numpy arrays into single numpy array?

Suppose I have ;假设我有;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

I try to convert;我尝试转换;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

I am solving it by iteration on vstack right now but it is really slow for especially large LIST我现在正在通过 vstack 上的迭代来解决它,但是对于特别大的 LIST 来说真的很慢

What do you suggest for the best efficient way?你对最有效的方法有什么建议?

In general you can concatenate a whole sequence of arrays along any axis:通常,您可以沿任何轴连接整个数组序列:

numpy.concatenate( LIST, axis=0 )

but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3x5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already).但是必须对列表中的形状和每个阵列的维度担心(用于2维3x5的输出,你需要确保它们都是2维正由-5阵列的话)。 If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.如果要将一维数组串联起来作为二维输出的行,则需要扩展它们的维数。

As Jorge's answer points out, there is also the function stack , introduced in numpy 1.10:正如 Jorge 的回答所指出的,还有在 numpy 1.10 中引入的函数stack

numpy.stack( LIST, axis=0 )

This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n -element 1D array becomes a 1-by- n 2D array) before concatenating.这采用互补的方法:它为每个输入数组创建一个新视图,并在连接之前添加一个额外的维度(在这种情况下,在左侧,因此每个n元素一维数组变成一个 1× n二维数组)。 It will only work if all the input arrays have the same shape—even along the axis of concatenation.只有当所有输入数组都具有相同的形状时,它才会起作用——即使沿着串联轴也是如此。

vstack (or equivalently row_stack ) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. vstack (或等效的row_stack )通常是一个更易于使用的解决方案,因为它将采用一维和/或二维数组的序列,并在必要时且仅在必要时自动扩展维度,然后将整个列表连接在一起。 Where a new dimension is required, it is added on the left.如果需要新尺寸,则将其添加到左侧。 Again, you can concatenate a whole list at once without needing to iterate:同样,您可以一次连接整个列表而无需迭代:

numpy.vstack( LIST )

This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ] (note the square brackets).语法快捷方式numpy.r_[ array1, ...., arrayN ]也展示了这种灵活的行为(注意方括号)。 This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST .这对于连接一些显式命名的数组很有用,但对您的情况不利,因为此语法不接受数组序列,例如LIST

There is also an analogous function column_stack and shortcut c_[...] , for horizontal (column-wise) stacking, as well as an almost -analogous function hstack —although for some reason the latter is less flexible (it is stricter about input arrays' dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).还有一个类似的函数column_stack和快捷方式c_[...] ,用于水平(按列)堆叠,以及一个几乎类似的函数hstack尽管由于某种原因后者不太灵活(它对输入更严格)数组的维数,并尝试将一维数组端到端连接而不是将它们视为列)。

Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:最后,在一维数组垂直堆叠的特定情况下,以下也适用:

numpy.array( LIST )

...because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning. ...因为数组可以由其他数组的序列构造而成,在开头添加一个新维度。

Starting in NumPy version 1.10, we have the method stack .从 NumPy 1.10 版开始,我们有了方法stack It can stack arrays of any dimension (all equal):它可以堆叠任何维度的数组(全部相等):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

Enjoy,享受,

I checked some of the methods for speed performance and find that there is no difference!我查了一些速度性能的方法,发现没有区别! The only difference is that using some methods you must carefully check dimension.唯一的区别是使用某些方法必须仔细检查尺寸。

Timing:定时:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

As you can see I tried 2 experiments - using np.random.rand(10000) and np.random.rand(1, 10000) And if we use 2d arrays than np.stack and np.array create additional dimension - result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.正如你所看到的,我尝试了 2 个实验 - 使用np.random.rand(10000)np.random.rand(1, 10000)如果我们使用二维数组而不是np.stacknp.array创建额外的维度 - result.shape是 (1,10000,10000) 和 (10000,1,10000) 所以他们需要额外的行动来避免这种情况。

Code:代码:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM