Pandas 一列的平均值，按其他列的值

Question

我想 output 和 dataframe 显示每个列出的流派的平均评分（基于“AverageRating”）。

以下是该表的残缺版本作为示例：

MovieID Movie_title     Action  Adventure   Animation   AverageRating               
1   Toy Story (1995)    0   0   1   3.878319
2   GoldenEye (1995)    1   1   0   3.206107
3   Four Rooms (1995)   0   0   0   3.033333
4   Get Shorty (1995)   1   0   0   3.550239
5   Copycat (1995)      0   0   0   3.302326
6   Shanghai Triad      0   0   0   3.576923
7   Twelve Monke(1995)  0   0   0   3.798469
8   Babe (1995)         0   0   0   3.995434
9   Dead Man W (1995)   0   0   0   3.896321
10  Richard III (1995)  0   0   0   3.831461
11  Seven (1995)        0   0   0   3.847458
12  Usual Suspec (1995) 0   0   0   4.385768
13  Mighty Aphro (1995) 0   0   0   3.418478
14  Postino, Il (1994)  0   0   0   3.967213
15  Mr. Holland's(1995) 0   0   0   3.778157
16  French Twist (1995) 0   0   0   3.205128
17  From Dusk Till      1   0   0   3.119565
18  White Balloon       0   0   0   2.800000
19  Antonia's Line      0   0   0   3.956522
20  Angels and Insects  0   0   0   3.416667
21  Muppet Treasure     1   1   0   2.761905
22  Braveheart (1995)   1   0   0   4.151515
23  Taxi Driver (1976)  0   0   0   4.120879
24  Rumble in the       1   1   0   3.448276
25  Birdcage            0   0   0   3.443686

所以，我需要拍摄 Action = 1 等的电影，计算这些电影的 AverageRating 的平均值，然后创建一个 dataframe，例如：

            AverageRating
Action      2.97
Adventure   3.14
Animation   3.30

使用 pd.groupby，做一列很简单：

df.groupby(['Action'])['AverageRating'].mean()

Action
0    3.095288
1    2.966332
Name: AverageRating, dtype: float64

...但我正在努力研究如何以所需的方式一次完成几列。 我知道必须有一种我想念的简单方法来做到这一点。 非常感谢任何帮助！

Answer 1

IIUC，您可以将类别与AverageRating相乘，然后在掩码上进行平均：

cats = df.iloc[:, 2:-1]
s = cats.mul(df.AverageRating, axis='rows')
s.mask(s.eq(0)).mean()
# per comment, this is a better option
# s.mask(cats.eq(0)).mean()

Output：

Action       3.372934
Adventure    3.138763
Animation    3.878319
dtype: float64

Pandas 一列的平均值，按其他列的值

问题描述

1 个解决方案

解决方案1
4 2019-11-15 19:02:04

Pandas 一列的平均值，按其他列的值

问题描述

1 个解决方案

解决方案1 4 2019-11-15 19:02:04

解决方案1
4 2019-11-15 19:02:04