根据多列分组聚合列的唯一值并计算唯一值 - pandas

Question

ID col1 col2    col3
I1 1    0       1 
I2 1    0       1 
I3 0    1       0 
I4 0    1       0 
I5 0    0       1

This is my dataframe.这是我的 dataframe。 I am looking forward to aggregate ID values based on the group by of col1,col2,col3 and also want a count columns along ith this.我期待根据 col1、col2、col3 的 group by 聚合 ID 值，并且还想要一个计数列。

Expected output:预期 output：

ID_List      Count 
[I1,I2]       2
[I3,I4]       2
[I5]          1

My code我的代码

cols_to_group = ['col1','col2','col3']
data = pd.DataFrame(df.groupby(cols_to_group)['id'].nunique()).reset_index(drop=True)
data.head()

   ID
0  2
1  2
2  1

Answer 1

You can do a groupby.agg() :你可以做一个groupby.agg() ：

df.groupby(['col1','col2','col3'], sort=False).ID.agg([list,'count'])

Output: Output：

                    list  count
col1 col2 col3                 
1    0    1     [I1, I2]      2
0    1    0     [I3, I4]      2
     0    1         [I5]      1

Answer 2

You need to aggregate a function by either sum, count etc. In this case, count.您需要通过 sum、count 等来聚合 function。在这种情况下，count。 Try the below code.试试下面的代码。

df.groupby(['col1','col2','col3']).ID.agg([list,'count']).reset_index(drop=True)

Output: Output：

    list    count
0   [I1, I2]    2
1   [I3, I4]    2
2   [I5]    1

Answer 3

Here you go:这里是 go：

grouped = df.groupby(['col1', 'col2', 'col3'], sort=False).ID
df = pd.DataFrame({
    'ID_List': grouped.aggregate(list),
    'Count': grouped.count()
}).reset_index(drop=True)
print(df)

Output: Output：

    ID_List  Count
0  [I1, I2]      2
1  [I3, I4]      2
2      [I5]      1

根据多列分组聚合列的唯一值并计算唯一值 - pandas

问题描述

3 个解决方案

解决方案1
2 2020-06-24 16:39:07

解决方案2
0 2020-06-24 16:45:29

解决方案3
0 2020-06-24 17:07:31

根据多列分组聚合列的唯一值并计算唯一值 - pandas

问题描述

3 个解决方案

解决方案1 2 2020-06-24 16:39:07

解决方案2 0 2020-06-24 16:45:29

解决方案3 0 2020-06-24 17:07:31

解决方案1
2 2020-06-24 16:39:07

解决方案2
0 2020-06-24 16:45:29

解决方案3
0 2020-06-24 17:07:31