Pandas 组到 numpy arrays 包括组信息

Question

我有一个像这样的 dataframe，

   df = pd.DataFrame({
            'id': ['A','A','A','B','B','C','C','C','C'],
            'groupId': [11,35,46,11,26,25,39,50,55],
            'type': [1,1,1,1,1,2,2,2,2],      
         })

我想将这些组变成 numpy arrays 包括列表中的类型值。 我试过了：

df.groupby(['id','type'])['groupId'].apply(np.array).tolist()

快完成了。 但我还想要 numpy 数组开头的类型值。 我想要的是：

[
np.array([1,11,35,46]),
np.array([1,11,26]),
np.array([2,25,39,50,55])
]

我觉得这很容易。 但我被困住了。

Answer 1

使用x.name作为type值并添加到np.array ：

a = df.groupby(['id','type'])['groupId'].apply(lambda x: np.array([x.name[1], *x])).tolist()
print (a)
[array([ 1, 11, 35, 46], dtype=int64),
 array([ 1, 11, 26], dtype=int64),
 array([ 2, 25, 39, 50, 55], dtype=int64)]

Answer 2

您应该首先按 ID 和类型分组，但只能将 groupId 聚合到一个列表中才能开始。 然后，您可以分配一个组，将您的类型和 groupId 一起列出。 扁平化是可能的。

    df = df.groupby(['id', 'type'], as_index=False).agg({
    'groupId' : list
})
df


    id  type    groupId
0   A   1   [11, 35, 46]
1   B   1   [11, 26]
2   C   2   [25, 39, 50, 55]

从此链接展平：

def flatten(foo):
        for x in foo:
            if hasattr(x, '__iter__') and not isinstance(x, str):
                for y in flatten(x):
                    yield y
            else:
                yield x

然后你可以创建一个类型和 groupId 的平面列表

df = df.assign(group=df[['type', 'groupId']].apply(lambda x: list(flatten(x)), axis = 1))
df

    id  type    groupId         group
0   A   1   [11, 35, 46]        [1, 11, 35, 46]
1   B   1   [11, 26]            [1, 11, 26]
2   C   2   [25, 39, 50, 55]    [2, 25, 39, 50, 55]

df['group'].apply(np.array).tolist()

[array([ 1, 11, 35, 46]), array([ 1, 11, 26]), array([ 2, 25, 39, 50, 55])]

Pandas 组到 numpy arrays 包括组信息

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-01-10 09:09:25

解决方案2
0 2022-01-10 09:36:41

Pandas 组到 numpy arrays 包括组信息

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-01-10 09:09:25

解决方案2 0 2022-01-10 09:36:41

解决方案1
3 已采纳 2022-01-10 09:09:25

解决方案2
0 2022-01-10 09:36:41