![](/img/trans.png)
[英]fill one columns' value with a part of other's column repeatedly with pandas
[英]Pandas mean of one column, by value of other columns
我想 output 和 dataframe 显示每个列出的流派的平均评分(基于“AverageRating”)。
以下是该表的残缺版本作为示例:
MovieID Movie_title Action Adventure Animation AverageRating
1 Toy Story (1995) 0 0 1 3.878319
2 GoldenEye (1995) 1 1 0 3.206107
3 Four Rooms (1995) 0 0 0 3.033333
4 Get Shorty (1995) 1 0 0 3.550239
5 Copycat (1995) 0 0 0 3.302326
6 Shanghai Triad 0 0 0 3.576923
7 Twelve Monke(1995) 0 0 0 3.798469
8 Babe (1995) 0 0 0 3.995434
9 Dead Man W (1995) 0 0 0 3.896321
10 Richard III (1995) 0 0 0 3.831461
11 Seven (1995) 0 0 0 3.847458
12 Usual Suspec (1995) 0 0 0 4.385768
13 Mighty Aphro (1995) 0 0 0 3.418478
14 Postino, Il (1994) 0 0 0 3.967213
15 Mr. Holland's(1995) 0 0 0 3.778157
16 French Twist (1995) 0 0 0 3.205128
17 From Dusk Till 1 0 0 3.119565
18 White Balloon 0 0 0 2.800000
19 Antonia's Line 0 0 0 3.956522
20 Angels and Insects 0 0 0 3.416667
21 Muppet Treasure 1 1 0 2.761905
22 Braveheart (1995) 1 0 0 4.151515
23 Taxi Driver (1976) 0 0 0 4.120879
24 Rumble in the 1 1 0 3.448276
25 Birdcage 0 0 0 3.443686
所以,我需要拍摄 Action = 1 等的电影,计算这些电影的 AverageRating 的平均值,然后创建一个 dataframe,例如:
AverageRating
Action 2.97
Adventure 3.14
Animation 3.30
使用 pd.groupby,做一列很简单:
df.groupby(['Action'])['AverageRating'].mean()
Action
0 3.095288
1 2.966332
Name: AverageRating, dtype: float64
...但我正在努力研究如何以所需的方式一次完成几列。 我知道必须有一种我想念的简单方法来做到这一点。 非常感谢任何帮助!
IIUC,您可以将类别与AverageRating
相乘,然后在掩码上进行平均:
cats = df.iloc[:, 2:-1]
s = cats.mul(df.AverageRating, axis='rows')
s.mask(s.eq(0)).mean()
# per comment, this is a better option
# s.mask(cats.eq(0)).mean()
Output:
Action 3.372934
Adventure 3.138763
Animation 3.878319
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.