繁体   English   中英

如何解决错误-名称未在python中的函数中定义

[英]How to solve the Error - name is not defined in a function in python

假设我有一个包含列、电影 ID、标题、年份和评级的数据集。 以下是我的数据的示例子集:

| movieId |   title  | rating | year |
|:-------:|:--------:|:------:|:----:|
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 0.5    | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 5      | 1995 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 5      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 1      | 2000 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 4.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 4      | 1997 |
| 3       | pqr      | 1.5    | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2.5    | 1997 |
| 4       | def      | 3      | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 0.5    | 1999 |
| 4       | def      | 2      | 1999 |
| 4       | def      | 3      | 1999 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 0.5    | 2006 |
| 5       | movie123 | 1      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2      | 2012 |
| 6       | movie456 | 4      | 2012 |

我想通过定义一个函数来计算整个数据集的平均值、计数、最小评分和平均评分。 所以,我首先计算每部电影的评分数量和平均值。

avg_rating = df.groupby(['movieId','year'])['ratings'].agg([('Count','size'), ('Mean','mean')]).sort_values(by='Mean',ascending=False)

由于某些电影的评论数量可能较少但评级较高,而其他电影的评论数量较多且评级较高,因此分析可能会出现偏差。 所以,我想计算加权平均值,为此我定义了一个函数。

# R = average for the movie (mean) = (Rating)
# v = number of ratings/reviews for the movie = (votes)
# m = minimum reviews required to be listed in the Top 250 movie list
# C = the mean rating across the whole report

def weighted_rating(R, v, m, C): 
  return (v/(v+m))*R + (m/(v+m))*C

avg_rating= avg_rating.assign(wr = weighted_rating(mean, count, 500, mean(mean)))

当我运行上面的最后一行代码时,我收到一个错误:name 'mean' is not defined

列计数仍然存在相同的错误:未定义名称“计数”

如何解决此错误,以便我的最终输出包含列 movieId、year、count、mean 和 wr?

您需要使用df的字段:

avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

所以最终的代码可能是这样的:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import (division, absolute_import, print_function,
                        unicode_literals)

import pandas as pd
from statistics import mean


def weighted_rating(R, v, m, C):
    return (v / (v + m)) * R + (m / (v + m)) * C


def main():

    df = pd.read_csv('movies.csv')

    avg_rating = df.groupby(['movieId', 'year'])['rating'].agg(
        [('Count', 'size'), ('Mean', 'mean')]).sort_values(by='Mean', ascending=False)

    avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

    print(avg_rating)


if __name__ == '__main__':
    main()

使用 CSV:

movieId,title,rating,year
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3,1995
1,abc,5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,0.5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,5,1995
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,4,2000
2,xyz,3.5,2000
2,xyz,5,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,4,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,1,2000
3,pqr,3,1997
3,pqr,2,1997
3,pqr,3.5,1997
3,pqr,3.5,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,4.5,1997
3,pqr,3.5,1997
3,pqr,4,1997
3,pqr,1.5,1997
3,pqr,2,1997
3,pqr,2,1997
3,pqr,2.5,1997
4,def,3,1999
4,def,2.5,1999
4,def,2.5,1999
4,def,0.5,1999
4,def,2,1999
4,def,3,1999
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,3,2006
5,movie12,2,2006
5,movie12,2.5,2006
5,movie12,3,2006
5,movie12,4,2006
5,movie12,0.5,2006
5,movie12,1,2006
5,movie12,3.5,2006
5,movie12,2,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,2.5,2006
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3.5,2006
5,movie12,3,2006
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,3.5,2012
6,movie45,4,2012
6,movie45,5,2012
6,movie45,2.5,2012
6,movie45,4,2012
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,5,2012
6,movie45,2,2012
6,movie45,4,2012

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM