将numpy数组的pandas列转换为高维的numpy数组

Question

I have a pandas dataframe of shape (75,9) . 我有一个形状为(75,9)的熊猫数据(75,9) 。

Only one of those columns is of numpy arrays, each of which is of shape (100, 4, 3) 这些列中只有一列是numpy数组，每列的形状都是(100, 4, 3) 100，4，3 (100, 4, 3)

I have a strange phenomenon: 我有一个奇怪的现象：

data = self.df[self.column_name].values[0]

is of shape (100,4,3) , but 形状为(100,4,3) ，但

data = self.df[self.column_name].values

is of shape (75,), with min and max are 'not a numeric object' 形状为（75，），且min和max不是“数字对象”

I expected data = self.df[self.column_name].values to be of shape (75, 100, 4, 3), with some min and max . 我期望data = self.df[self.column_name].values的形状为（ data = self.df[self.column_name].values ），具有一些min和max 。

How can I make a column of numpy arrays behave like a numpy array of a higher dimension (with length=number of rows in the dataframe)? 如何使一列numpy数组的行为类似于更高维度的numpy数组 （长度=数据帧中的行数）？

Reproducing: 复制：

    some_df = pd.DataFrame(columns=['A'])
    for i in range(10):
        some_df.loc[i] = [np.random.rand(4, 6)]
    print some_df['A'].values.shape
    print some_df['A'].values[0].shape

prints (10L,) , (4L,6L) instead of desired (10L, 4L, 6L) , (4L,6L) 打印(10L,) ， (4L,6L)而不是所需的(10L, 4L, 6L) ， (4L,6L)

Answer 1

What you're asking for is not quite possible. 您要求的是不可能的。 Pandas DataFrames are 2D. 熊猫数据框是2D的。 Yes, you can store NumPy arrays as object s (references) inside DataFrame cells, but this is not really well supported, and expecting to get a shape which has one dimension from the DataFrame and two from the arrays inside is not possible at all. 是的，您可以将NumPy数组存储为DataFrame单元内的object （引用），但这并没有得到很好的支持，并且完全不可能从DataFrame中获得具有一维的shape ，而从数组中获得具有两个维的shape 。

You should consider storing your data either entirely in NumPy arrays of the appropriate shape, or in a single, properly 2D DataFrame with MultiIndex. 您应该考虑将数据完全存储在适当形状的NumPy数组中，或者存储在具有MultiIndex的单个正确2D DataFrame中。 For example you can "pivot" a column of 1D arrays to become a column of scalars if you move the extra dimension to a new level of a MultIndex on the rows: 例如，如果将额外的维度移动到行上MultIndex的新级别，则可以“旋转”一维数组的列成为标量列：

  A
x [2, 3]
y [5, 6]

becomes: 变成：

or pivot to the columns: 或转到列：

Answer 2

In [42]: some_df = pd.DataFrame(columns=['A']) 
    ...: for i in range(4): 
    ...:         some_df.loc[i] = [np.random.randint(0,10,(1,3))] 
    ...:                                                                                  
In [43]: some_df                                                                          
Out[43]: 
             A
0  [[7, 0, 9]]
1  [[3, 6, 8]]
2  [[9, 7, 6]]
3  [[1, 6, 3]]

The numpy values of the column are an object dtype array, containing arrays: 列的numpy值是对象dtype数组，其中包含数组：

In [44]: some_df['A'].to_numpy()                                                          
Out[44]: 
array([array([[7, 0, 9]]), array([[3, 6, 8]]), array([[9, 7, 6]]),
       array([[1, 6, 3]])], dtype=object)

If those arrays all have the same shape, stack does a nice job of concatenating them on a new dimension: 如果这些数组都具有相同的形状，则stack可以很好地将它们连接到新的维度上：

In [45]: np.stack(some_df['A'].to_numpy())                                                
Out[45]: 
array([[[7, 0, 9]],

       [[3, 6, 8]],

       [[9, 7, 6]],

       [[1, 6, 3]]])
In [46]: _.shape                                                                          
Out[46]: (4, 1, 3)

This only works with one column. 这仅适用于一列。 stack like all concatenate treats the input argument as an iterable, effectively a list of arrays. 像所有concatenate一样， stack将输入参数视为可迭代的数组，实际上是数组的列表。

In [48]: some_df['A'].to_list()                                                           
Out[48]: 
[array([[7, 0, 9]]),
 array([[3, 6, 8]]),
 array([[9, 7, 6]]),
 array([[1, 6, 3]])]
In [50]: np.stack(some_df['A'].to_list()).shape                                           
Out[50]: (4, 1, 3)

将numpy数组的pandas列转换为高维的numpy数组

问题描述

2 个解决方案

解决方案1
1 2019-06-16 10:15:54

解决方案2
1 已采纳 2019-06-16 15:34:36

将numpy数组的pandas列转换为高维的numpy数组

问题描述

2 个解决方案

解决方案1 1 2019-06-16 10:15:54

解决方案2 1 已采纳 2019-06-16 15:34:36

解决方案1
1 2019-06-16 10:15:54

解决方案2
1 已采纳 2019-06-16 15:34:36