简体   繁体   English

有没有办法在使用 pandas 按第三列中的值分组时将两列中的值相乘?

[英]Is there a way to multiply the values in two columns while grouping by values in third column using pandas?

So I'm trying to avoid using a loop while calculating the mean of the weighted grades in each of these courses.因此,在计算每门课程的加权成绩平均值时,我试图避免使用循环。

I just can't wrap my head around what to do.我只是不知道该怎么做。 I assume I can use groupby and perform the appropriate calcualtions?我假设我可以使用 groupby 并执行适当的计算?

This is the dataframe:这是 dataframe:

data = 

mark  weight  course_id
78      10          1
87      40          1
15      50          1
78      90          3
40      10          3

This is the desired result:这是期望的结果:

result=

course_id  course_average
1            50.1
3            74.2      

This is one way to go about it:这是 go 关于它的一种方法:

(df.assign(course_average=df.mark * df.weight)
   .groupby("course_id")
   .pipe(lambda x: x.course_average.sum().div(x.weight.sum()))
   .reset_index(name="course_average"))


    course_id   course_average
0      1         50.1
1      3         74.2

If the numbers don't always add up to 100 for each group, then you can calculate the proportion of weight for each row of each group and multiply by mark .如果每组的数字加起来并不总是 100,那么您可以计算每组每行的weight比例并乘以mark

(data.assign(wa = data['mark'] * data['weight'] / 
             data.groupby('course_id')['weight'].transform('sum'))
     .groupby('course_id')['wa'].sum())
Out[1]: 
course_id
1    50.1
3    74.2
Name: wa, dtype: float64

If the weights do add up to 100 for each group, then the calculation is easier:如果每个组的权重加起来为 100,则计算更容易:

data.assign(wa = data['mark'] * data['weight'] / 100).groupby('course_id')['wa'].sum()

Out[2]: 
course_id
1    50.1
3    74.2
Name: wa, dtype: float64

You can do this with a simple 1 liner using groupby and lambda for weighted average as follows -您可以使用groupbylambda使用简单的 1 班轮进行加权平均,如下所示 -

df.groupby(['course_id']).apply(lambda x: sum(x['mark']*x['weight'])/sum(x['weight']))
course_id
1    50.1
3    74.2
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 大熊猫:按两列分组,然后按第三列的值排序 - pandas: Grouping by two columns and then sorting it by the values of a third column 用值分组两列以获得第三列 - Grouping two columns with values to get a third column 根据pandas中的第三列保留两列之间的值 - Keep values of between two columns based on third column in pandas 如果两列的值在第三列 pandas 中相同,则合并两列 - Merge two columns if their values are the same in a third column pandas 使用第二列的百分比变化填充列中的 null 个值,同时按第三列分组 - Fill null values in a column using percent change from a second column while grouping by a third column pandas dataframe,使用字典乘以列值 - pandas dataframe, multiply column values using a dict 使用熊猫基于其他两列中的值替换列中的值 - Replace values in column based on values in two other columns using pandas pivot 列名称和索引 pandas df 到列本身的有效方法,对应的值作为第三列? - Efficient way to pivot columns names and index in pandas df to columns themselves, with corresponding values as third column? pandas 列中的分组值 - grouping values in pandas column Pandas - 在两列中查找具有匹配值的行,并在另一列中查找值 - Pandas - find rows with matching values in two columns and multiply value in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM