简体   繁体   English

基于二维数组填充 3d 数组的有效方法是什么?

[英]What is the efficient way to fill a 3d Array based on a 2d array?

Assume I have a 2d array.假设我有一个二维数组。

a = np.array([[0,2,3],[4,2,1]])

The dimension is number_of_instances * 3 , where the values in the 2d array represent the row index in a pandas dataframe .维度为number_of_instances * 3 ,其中二维数组中的值表示pandas dataframe中的行索引。

I have a dataframe :我有一个dataframe

df = pd.DataFrame(np.array([[10, 10, 10, 10], [11, 11, 11, 11], [12, 12, 12, 12], [13, 13, 13, 13], [14, 14, 14, 14]]), columns = list('ABCD'))

Out[23]: 
   A   B   C   D
0  10  10  10  10
1  11  11  11  11
2  12  12  12  12
3  13  13  13  13
4  14  14  14  14

Now I have a zero 3d array, I try to fill the 3d array by the values in pandas dataframe .现在我有一个零 3d 数组,我尝试用pandas dataframe中的值填充 3d 数组

b = np.empty(2,3,4)

The dimension is number_of_instances * 3 * number_of_features , where the number_of_features is extracted from pandas dataframe by the corresponding row index in 2d array.维度是number_of_instances * 3 * number_of_features ,其中number_of_features是通过二维数组中的相应行索引从pandas dataframe中提取的。

Ideally, I would expect b looks like:理想情况下,我希望 b 看起来像:

Out[24]:
array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],
       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

What is the most efficient way to fill this 3d array?填充这个 3d 阵列的最有效方法是什么?

Looks like you just need indexing看起来你只需要索引

df.to_numpy()[a]

array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],

       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

How about:怎么样:

df.loc[a.ravel()].values.reshape((2,3,4))

Output: Output:

array([[[10, 10, 10, 10],
        [12, 12, 12, 12],
        [13, 13, 13, 13]],

       [[14, 14, 14, 14],
        [12, 12, 12, 12],
        [11, 11, 11, 11]]])

What you want is called advanced indexing in the official numpy documentation.您想要的在官方 numpy 文档中称为高级索引。

For your working example, for example, you should do the following.例如,对于您的工作示例,您应该执行以下操作。

First, access the numpy array corresponding to the values of the dataframe by calling df.values.首先,通过调用df.values访问dataframe的值对应的numpy数组。 Then, simply do:然后,只需执行以下操作:

df.values[[[0,1,3],[4,2,1]],:]

And you are done.你完成了。

The above indexing passes a list of two objects to the array.上面的索引将两个对象的列表传递给数组。 The first is [[0,1,3],[4,2,1]], the second is:.第一个是[[0,1,3],[4,2,1]],第二个是:。 The first is meant to index the 1 axis (rows), the second the 2 axis (columns).第一个用于索引 1 轴(行),第二个用于索引 2 轴(列)。

The: symbol just returns all columns. : 符号只返回所有列。

Now, for the rows, you have a list of two lists: [[0,1,3],[4,2,1]].现在,对于行,您有一个包含两个列表的列表:[[0,1,3],[4,2,1]]。 This construction will return two arrays, just like what you want.此构造将返回两个 arrays,就像您想要的一样。 The first array will have the rows 0, 1 and 3, and the second will have 4, 2 and 1.第一个数组将包含行 0、1 和 3,第二个数组将包含 4、2 和 1。

Numpy is powerfull. Numpy 功能强大。 You can do much by just leveraging the power of indexing.只需利用索引的力量,您就可以做很多事情。

Edit: observe that you already have the list [[0,1,3],[4,2,1]] in the variable a.编辑:观察你已经在变量 a 中有列表 [[0,1,3],[4,2,1]]。 So df.values[a] will do it, as other mentioned.所以 df.values[a] 会这样做,正如其他提到的那样。 That's because the column: argument is optional in this case.这是因为在这种情况下 column: 参数是可选的。 But it is useful to see the full notation.但是查看完整的符号很有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM