pandas - 在一列中删除重复项，计算重复项的数量并聚合一列

Question

I'm trying to remove duplicates values in ID column, count the duplicates in the ID column and create a new column called Count, and concatenate the Axis column我正在尝试删除 ID 列中的重复值，计算 ID 列中的重复值并创建一个名为 Count 的新列，然后连接 Axis 列

THIS IS MY CURRENT DATAFRAME:这是我当前的数据框：

ID    Axis    
1   1 2 3 4 
1   0 1 2 3 
1   4 5 2 4 
2   7 8 9 10 
2   1 2 3 4 
3   6 7 8 9 
4   1 2 3 4 
4   0 1 2 3

Desired output期望的输出


 ID  count  Axis    
 1    3    [1 2 3 4 , 0 1 2 3 ,  4 5 2 4]
 2    2    [ 7 8 9 10 ,  1 2 3 4] 
 3    1    [6 7 8 9 ]
 4    2    [1 2 3 4 , 0 1 2 3]

I know I'm supposed to use aggregate function, but I'm not getting it.我知道我应该使用聚合函数，但我不明白。 If someone can guide me, I would really appreciate it如果有人可以指导我，我将不胜感激

Answer 1

Use:采用：

df2 = df.groupby('ID').agg(lambda x: list(x))
df2['count'] = df2['Axis'].apply(lambda x: len(x))
print(df2)

which gives:这使：

                                          Axis  count
ID                                                   
1   [[1, 2, 3, 4], [0, 1, 2, 3], [4, 5, 2, 4]]      3
2                [[7, 8, 9, 10], [1, 2, 3, 4]]      2
3                               [[6, 7, 8, 9]]      1
4                 [[1, 2, 3, 4], [0, 1, 2, 3]]      2

for the DataFrame对于数据框

  ID           Axis
0   1   [1, 2, 3, 4]
1   1   [0, 1, 2, 3]
2   1   [4, 5, 2, 4]
3   2  [7, 8, 9, 10]
4   2   [1, 2, 3, 4]
5   3   [6, 7, 8, 9]
6   4   [1, 2, 3, 4]
7   4   [0, 1, 2, 3]

Answer 2

out = df.groupby('ID')['Axis'].agg(['count', ('Axis', lambda x: list(x))])

out

    ID  count   Axis
0   1   3   [1 2 3 4 , 0 1 2 3 , 4 5 2 4 ]
1   2   2   [7 8 9 10 , 1 2 3 4 ]
2   3   1   [6 7 8 9 ]
3   4   2   [1 2 3 4 , 0 1 2 3 ]

Answer 3

Your dataframe can be obtained by:您的数据框可以通过以下方式获得：

df = pd.DataFrame(data=np.array([[1,"1 2 3 4"]
                            ,[1, "0 1 2 3"]
                            ,[1, "4 5 2 4"]
                            ,[2, "1 2 3 4"]
                            ,[ 2,"7 8 9 10"]
                            ,[ 2, "1 2 3 4" ]
                            ,[ 3, "6 7 8 9" ]
                            ,[ 4, "1 2 3 4" ]
                            ,[ 4, "0 1 2 3" ]
                            ]),columns=['ID', 'Axis']).set_index('ID')

The following solution will get you the desired result:以下解决方案将为您提供所需的结果：

df1 = pd.DataFrame()
df1["count"] = df.groupby("ID").count()
df1["Axis"] = df.groupby("ID").agg({"Axis": lambda x: list(x.unique())})

The result is:结果是：

ID  count   Axis    
1   3   [1 2 3 4, 0 1 2 3, 4 5 2 4]
2   3   [1 2 3 4, 7 8 9 10]
3   1   [6 7 8 9]
4   2   [1 2 3 4, 0 1 2 3]

pandas - 在一列中删除重复项，计算重复项的数量并聚合一列

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-12-17 08:27:45

解决方案2
0 2022-12-17 08:36:27

解决方案3
0 2022-12-17 09:04:48

pandas - 在一列中删除重复项，计算重复项的数量并聚合一列

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-12-17 08:27:45

解决方案2 0 2022-12-17 08:36:27

解决方案3 0 2022-12-17 09:04:48

解决方案1
1 已采纳 2022-12-17 08:27:45

解决方案2
0 2022-12-17 08:36:27

解决方案3
0 2022-12-17 09:04:48