[英]How to solve the Error - name is not defined in a function in python
假设我有一个包含列、电影 ID、标题、年份和评级的数据集。 以下是我的数据的示例子集:
| movieId | title | rating | year |
|:-------:|:--------:|:------:|:----:|
| 1 | abc | 3.5 | 1995 |
| 1 | abc | 3 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 3 | 1995 |
| 1 | abc | 5 | 1995 |
| 1 | abc | 3.5 | 1995 |
| 1 | abc | 4.5 | 1995 |
| 1 | abc | 0.5 | 1995 |
| 1 | abc | 3.5 | 1995 |
| 1 | abc | 4.5 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 5 | 1995 |
| 1 | abc | 4.5 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 3 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 3.5 | 1995 |
| 1 | abc | 3 | 1995 |
| 1 | abc | 4 | 1995 |
| 1 | abc | 5 | 1995 |
| 1 | abc | 4.5 | 1995 |
| 1 | abc | 5 | 1995 |
| 2 | xyz | 3 | 2000 |
| 2 | xyz | 2 | 2000 |
| 2 | xyz | 3.5 | 2000 |
| 2 | xyz | 4 | 2000 |
| 2 | xyz | 3.5 | 2000 |
| 2 | xyz | 5 | 2000 |
| 2 | xyz | 3.5 | 2000 |
| 2 | xyz | 3 | 2000 |
| 2 | xyz | 3 | 2000 |
| 2 | xyz | 2 | 2000 |
| 2 | xyz | 3.5 | 2000 |
| 2 | xyz | 3 | 2000 |
| 2 | xyz | 3 | 2000 |
| 2 | xyz | 4 | 2000 |
| 2 | xyz | 2 | 2000 |
| 2 | xyz | 3.5 | 2000 |
| 2 | xyz | 1 | 2000 |
| 3 | pqr | 3 | 1997 |
| 3 | pqr | 2 | 1997 |
| 3 | pqr | 3.5 | 1997 |
| 3 | pqr | 3.5 | 1997 |
| 3 | pqr | 3 | 1997 |
| 3 | pqr | 3 | 1997 |
| 3 | pqr | 3 | 1997 |
| 3 | pqr | 3 | 1997 |
| 3 | pqr | 4.5 | 1997 |
| 3 | pqr | 3.5 | 1997 |
| 3 | pqr | 4 | 1997 |
| 3 | pqr | 1.5 | 1997 |
| 3 | pqr | 2 | 1997 |
| 3 | pqr | 2 | 1997 |
| 3 | pqr | 2.5 | 1997 |
| 4 | def | 3 | 1999 |
| 4 | def | 2.5 | 1999 |
| 4 | def | 2.5 | 1999 |
| 4 | def | 0.5 | 1999 |
| 4 | def | 2 | 1999 |
| 4 | def | 3 | 1999 |
| 5 | movie123 | 4 | 2006 |
| 5 | movie123 | 4 | 2006 |
| 5 | movie123 | 3 | 2006 |
| 5 | movie123 | 1.5 | 2006 |
| 5 | movie123 | 3 | 2006 |
| 5 | movie123 | 2 | 2006 |
| 5 | movie123 | 2.5 | 2006 |
| 5 | movie123 | 3 | 2006 |
| 5 | movie123 | 4 | 2006 |
| 5 | movie123 | 0.5 | 2006 |
| 5 | movie123 | 1 | 2006 |
| 5 | movie123 | 3.5 | 2006 |
| 5 | movie123 | 2 | 2006 |
| 5 | movie123 | 3 | 2006 |
| 5 | movie123 | 1.5 | 2006 |
| 5 | movie123 | 2.5 | 2006 |
| 5 | movie123 | 4 | 2006 |
| 5 | movie123 | 4 | 2006 |
| 5 | movie123 | 3.5 | 2006 |
| 5 | movie123 | 3 | 2006 |
| 6 | movie456 | 4 | 2012 |
| 6 | movie456 | 3.5 | 2012 |
| 6 | movie456 | 3.5 | 2012 |
| 6 | movie456 | 4 | 2012 |
| 6 | movie456 | 5 | 2012 |
| 6 | movie456 | 2.5 | 2012 |
| 6 | movie456 | 4 | 2012 |
| 6 | movie456 | 4 | 2012 |
| 6 | movie456 | 3.5 | 2012 |
| 6 | movie456 | 5 | 2012 |
| 6 | movie456 | 2 | 2012 |
| 6 | movie456 | 4 | 2012 |
我想通过定义一个函数来计算整个数据集的平均值、计数、最小评分和平均评分。 所以,我首先计算每部电影的评分数量和平均值。
avg_rating = df.groupby(['movieId','year'])['ratings'].agg([('Count','size'), ('Mean','mean')]).sort_values(by='Mean',ascending=False)
由于某些电影的评论数量可能较少但评级较高,而其他电影的评论数量较多且评级较高,因此分析可能会出现偏差。 所以,我想计算加权平均值,为此我定义了一个函数。
# R = average for the movie (mean) = (Rating)
# v = number of ratings/reviews for the movie = (votes)
# m = minimum reviews required to be listed in the Top 250 movie list
# C = the mean rating across the whole report
def weighted_rating(R, v, m, C):
return (v/(v+m))*R + (m/(v+m))*C
avg_rating= avg_rating.assign(wr = weighted_rating(mean, count, 500, mean(mean)))
当我运行上面的最后一行代码时,我收到一个错误:name 'mean' is not defined
列计数仍然存在相同的错误:未定义名称“计数”
如何解决此错误,以便我的最终输出包含列 movieId、year、count、mean 和 wr?
您需要使用df
的字段:
avg_rating = avg_rating.assign(
wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))
所以最终的代码可能是这样的:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import (division, absolute_import, print_function,
unicode_literals)
import pandas as pd
from statistics import mean
def weighted_rating(R, v, m, C):
return (v / (v + m)) * R + (m / (v + m)) * C
def main():
df = pd.read_csv('movies.csv')
avg_rating = df.groupby(['movieId', 'year'])['rating'].agg(
[('Count', 'size'), ('Mean', 'mean')]).sort_values(by='Mean', ascending=False)
avg_rating = avg_rating.assign(
wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))
print(avg_rating)
if __name__ == '__main__':
main()
使用 CSV:
movieId,title,rating,year
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3,1995
1,abc,5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,0.5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,5,1995
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,4,2000
2,xyz,3.5,2000
2,xyz,5,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,4,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,1,2000
3,pqr,3,1997
3,pqr,2,1997
3,pqr,3.5,1997
3,pqr,3.5,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,4.5,1997
3,pqr,3.5,1997
3,pqr,4,1997
3,pqr,1.5,1997
3,pqr,2,1997
3,pqr,2,1997
3,pqr,2.5,1997
4,def,3,1999
4,def,2.5,1999
4,def,2.5,1999
4,def,0.5,1999
4,def,2,1999
4,def,3,1999
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,3,2006
5,movie12,2,2006
5,movie12,2.5,2006
5,movie12,3,2006
5,movie12,4,2006
5,movie12,0.5,2006
5,movie12,1,2006
5,movie12,3.5,2006
5,movie12,2,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,2.5,2006
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3.5,2006
5,movie12,3,2006
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,3.5,2012
6,movie45,4,2012
6,movie45,5,2012
6,movie45,2.5,2012
6,movie45,4,2012
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,5,2012
6,movie45,2,2012
6,movie45,4,2012
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.