在组内联合非集迭代的有效方法

Question

I have this df 我有这个df

df = pd.DataFrame(dict(
        A=['b', 'a', 'b', 'c', 'a', 'c', 'a', 'c', 'a', 'a'],
        B=[[0, 2, 3, 1],
           [9, 6, 7, 2],
           [6, 0, 1, 4],
           [9, 2, 5, 1],
           [5, 1, 4, 8],
           [8, 5, 6, 6],
           [0, 9, 0, 0],
           [2, 6, 1, 8],
           [7, 3, 2, 6],
           [8, 7, 1, 9]]
        ))

I want to group by 'A' and union all the lists in 'B' 我想按'A'分组并联合'B'所有列表

Neither df.groupby('A').B.union() nor df.groupby('A').B.apply(set.union) work. 既不是df.groupby('A').B.union()也不是df.groupby('A').B.apply(set.union)工作。

I want the result to be 我想要结果

A
a    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b                {0, 1, 2, 3, 4, 6}
c                {1, 2, 5, 6, 8, 9}
Name: B, dtype: object

Answer 1

The problem is that you need to cast them as sets first before applying the union. 问题是你需要在应用联合之前先将它们作为集合进行转换。 One solution would be to use sum to concatenate the groups, then cast to set using map 一种解决方案是使用sum来连接组，然后使用map为set

In [28]: df.groupby('A').B.sum().map(set)
Out[28]:
A
a    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b                {0, 1, 2, 3, 4, 6}
c                {1, 2, 5, 6, 8, 9}
dtype: object

Answer 2

maxymoo's answer is nice, but since it first adds all the lists together it might unnecessarily take a lot of memory (especially so if there are lots of duplicates). maxymoo的答案很好，但由于它首先将所有列表添加到一起，因此可能不必要地占用大量内存（特别是如果有大量重复内容）。

Instead, you should first convert column B to sets, after which you can reduce to a single set much more efficiently. 相反，您应该首先将列B转换为集合，之后您可以更有效地减少到单个集合。 Like this: 像这样：

df['B'] = df['B'].map(set)

   A             B
0  b  {0, 1, 2, 3}
1  a  {9, 2, 6, 7}
2  b  {0, 1, 4, 6}
3  c  {9, 2, 5, 1}
4  a  {8, 1, 4, 5}
5  c     {8, 5, 6}
6  a        {0, 9}
7  c  {8, 1, 2, 6}
8  a  {2, 3, 6, 7}
9  a  {8, 1, 9, 7}

df.groupby('A').B.apply(lambda x: reduce(set.union, x))

A
a    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b                {0, 1, 2, 3, 4, 6}
c                {1, 2, 5, 6, 8, 9}
Name: B, dtype: object

Or, as a one-liner, as maxymoo points out: 或者，作为一个单行，如maxymoo指出：

df.groupby('A').B.apply(lambda x : reduce(set.union, x.map(set)))

Answer 3

I'd use a function to apply with 我会使用一个函数来申请

def f(x):
    # grabbing first one so I can
    # make a set out of it
    first, *rest = x.values.tolist()
    # union won't work unless it's on
    # a set, it doesn't care about the rest
    return set(first).union(*rest)

df.groupby('A').B.apply(f)

A
a    {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b                {0, 1, 2, 3, 4, 6}
c                {1, 2, 5, 6, 8, 9}
Name: B, dtype: object

在组内联合非集迭代的有效方法

问题描述

3 个解决方案

解决方案1
5 已采纳 2016-11-23 22:26:58

解决方案2
2 2016-11-23 22:55:04

解决方案3
1

在组内联合非集迭代的有效方法

问题描述

3 个解决方案

解决方案1 5 已采纳 2016-11-23 22:26:58

解决方案2 2 2016-11-23 22:55:04

解决方案3 1

解决方案1
5 已采纳 2016-11-23 22:26:58

解决方案2
2 2016-11-23 22:55:04

解决方案3
1