[英]Is there a way to multiply the values in two columns while grouping by values in third column using pandas?
So I'm trying to avoid using a loop while calculating the mean of the weighted grades in each of these courses.因此,在计算每门课程的加权成绩平均值时,我试图避免使用循环。
I just can't wrap my head around what to do.我只是不知道该怎么做。 I assume I can use groupby and perform the appropriate calcualtions?
我假设我可以使用 groupby 并执行适当的计算?
This is the dataframe:这是 dataframe:
data =
mark weight course_id
78 10 1
87 40 1
15 50 1
78 90 3
40 10 3
This is the desired result:这是期望的结果:
result=
course_id course_average
1 50.1
3 74.2
This is one way to go about it:这是 go 关于它的一种方法:
(df.assign(course_average=df.mark * df.weight)
.groupby("course_id")
.pipe(lambda x: x.course_average.sum().div(x.weight.sum()))
.reset_index(name="course_average"))
course_id course_average
0 1 50.1
1 3 74.2
If the numbers don't always add up to 100 for each group, then you can calculate the proportion of weight
for each row of each group and multiply by mark
.如果每组的数字加起来并不总是 100,那么您可以计算每组每行的
weight
比例并乘以mark
。
(data.assign(wa = data['mark'] * data['weight'] /
data.groupby('course_id')['weight'].transform('sum'))
.groupby('course_id')['wa'].sum())
Out[1]:
course_id
1 50.1
3 74.2
Name: wa, dtype: float64
If the weights do add up to 100 for each group, then the calculation is easier:如果每个组的权重加起来为 100,则计算更容易:
data.assign(wa = data['mark'] * data['weight'] / 100).groupby('course_id')['wa'].sum()
Out[2]:
course_id
1 50.1
3 74.2
Name: wa, dtype: float64
You can do this with a simple 1 liner using groupby
and lambda
for weighted average as follows -您可以使用
groupby
和lambda
使用简单的 1 班轮进行加权平均,如下所示 -
df.groupby(['course_id']).apply(lambda x: sum(x['mark']*x['weight'])/sum(x['weight']))
course_id
1 50.1
3 74.2
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.