Pandas - 按列分组并将数据转换为numpy数组

Question

Having the following data frame, group A have 4 samples, B 3 samples and C 1 sample: 具有以下数据帧，组A具有4个样本，B 3个样本和C 1个样本：

  group   data_1   data_2
0     A        1        4
1     A        2        5
2     A        3        6
3     A        4        7
4     B        1        4
5     B        2        5
6     B        3        6
7     C        1        4

I would like to transform the data into numpy array, where each row is a group with all its samples and zero padding for groups that have fewer samples. 我想将数据转换为numpy数组，其中每一行都是一个包含所有样本的组，而对于具有较少样本的组则为零填充。

Resulting in an array like so: 导致像这样的数组：

[
   [[1,4],[2,5],[3,6],[4,7]], # this is A group 4 samples
   [[1,4],[2,5],[3,6],[0,0]], # this is B group 3 samples
   [[1,4],[0,0],[0,0],[0,0]], # this is C group 1 sample
]

Answer 1

First is necessary add missing values - first solution with unstack and stack , counter Series is created by cumcount . 首先需要添加缺少的值-用第一溶液unstack和stack ，计数器系列是由创建cumcount 。

Second solution use reindex by MultiIndex . 第二种解决方案使用MultiIndex reindex 。

Last use lambda function with groupby , convert to numpy array by values and last to lists: 最后使用lambda函数和groupby ，按values转换为numpy数组，最后转到列表：

g = df.groupby('group').cumcount()
L = (df.set_index(['group',g])
       .unstack(fill_value=0)
       .stack().groupby(level=0)
       .apply(lambda x: x.values.tolist())
       .tolist())
print (L)

[[[1, 4], [2, 5], [3, 6], [4, 7]], 
 [[1, 4], [2, 5], [3, 6], [0, 0]], 
 [[1, 4], [0, 0], [0, 0], [0, 0]]]

Another solution: 另一种方案：

g = df.groupby('group').cumcount()
mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])
L = (df.set_index(['group',g])
       .reindex(mux, fill_value=0)
       .groupby(level=0)['data_1','data_2']
       .apply(lambda x: x.values.tolist())
       .tolist()
)

Pandas - 按列分组并将数据转换为numpy数组

问题描述

1 个解决方案

解决方案1
8 已采纳 2018-10-03 07:10:18

Pandas - 按列分组并将数据转换为numpy数组

问题描述

1 个解决方案

解决方案1 8 已采纳 2018-10-03 07:10:18

解决方案1
8 已采纳 2018-10-03 07:10:18