如何在 pandas 组中应用函数

Question

我有一个 DataFrame。 我目前按列分组，说column_a 。 我正在尝试使用关联的column_c获取column_b的唯一项目：

像这样：

(10.2*4 + 12.4*4 + 4.5*5) / (10.2+12.4+4.5) = 112.9 / 27.1 = 4.166

数据片段如下所示

df =pd.DataFrame({"column_a": [1,1,1,1,1,1,1,1,1], 
                 "column_b": [10.2, 10.2, 10.2, 12.4, 12.4, 12.4, 12.4, 4.5, 4.5],
                 "column_c": [4,4,4,4,4,4,4,5,5]})
df

    column_a    column_b    column_c
0      1         10.2         4
1      1         10.2         4
2      1         10.2         4
3      1         12.4         4
4      1         12.4         4
5      1         12.4         4
6      1         12.4         4
7      1         4.5          5
8      1         4.5          5

这是我尝试过的，不幸的是，column_c中的唯一项目与column_b不一样，请问我该如何解决这个问题？

g =df.groupby("column_a")

def func1(row):
    unique_b = row["column_b"].unique()
    unique_c = row["column_c"].unique()
    aggregated_b = sum(unique_b)
    aggregated = np.dot(unique_a, unique_b)/aggregated_b
    return aggregated
    
g.apply(func1)

Answer 1

好像你想要groupby + apply ：

(df.drop_duplicates() # you should restrict the columns here if you have more
   .groupby('column_a')
   .apply(lambda g: (g['column_b']*g['column_c']).sum()/g['column_b'].sum())
)

output：

column_a
1    4.166052
dtype: float64

如何在 pandas 组中应用函数

问题描述

1 个解决方案

解决方案1
1 2021-11-29 16:13:56

如何在 pandas 组中应用函数

问题描述

1 个解决方案

解决方案1 1 2021-11-29 16:13:56

解决方案1
1 2021-11-29 16:13:56