简体   繁体   中英

How to solve the Error - name is not defined in a function in python

Let's assume that I have a dataset consisting the columns, movieId, title, year and ratings. Below is a sample subset of my data:

| movieId |   title  | rating | year |
|:-------:|:--------:|:------:|:----:|
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 0.5    | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 3.5    | 1995 |
| 1       | abc      | 3      | 1995 |
| 1       | abc      | 4      | 1995 |
| 1       | abc      | 5      | 1995 |
| 1       | abc      | 4.5    | 1995 |
| 1       | abc      | 5      | 1995 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 5      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 3      | 2000 |
| 2       | xyz      | 4      | 2000 |
| 2       | xyz      | 2      | 2000 |
| 2       | xyz      | 3.5    | 2000 |
| 2       | xyz      | 1      | 2000 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 3      | 1997 |
| 3       | pqr      | 4.5    | 1997 |
| 3       | pqr      | 3.5    | 1997 |
| 3       | pqr      | 4      | 1997 |
| 3       | pqr      | 1.5    | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2      | 1997 |
| 3       | pqr      | 2.5    | 1997 |
| 4       | def      | 3      | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 2.5    | 1999 |
| 4       | def      | 0.5    | 1999 |
| 4       | def      | 2      | 1999 |
| 4       | def      | 3      | 1999 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 0.5    | 2006 |
| 5       | movie123 | 1      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 2      | 2006 |
| 5       | movie123 | 3      | 2006 |
| 5       | movie123 | 1.5    | 2006 |
| 5       | movie123 | 2.5    | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 4      | 2006 |
| 5       | movie123 | 3.5    | 2006 |
| 5       | movie123 | 3      | 2006 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2.5    | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 4      | 2012 |
| 6       | movie456 | 3.5    | 2012 |
| 6       | movie456 | 5      | 2012 |
| 6       | movie456 | 2      | 2012 |
| 6       | movie456 | 4      | 2012 |

I want to calculate the average, count, minimum rating, and mean ratings across the whole dataset by defining a function. So, I first calculate the number and average of ratings for each movie.

avg_rating = df.groupby(['movieId','year'])['ratings'].agg([('Count','size'), ('Mean','mean')]).sort_values(by='Mean',ascending=False)

Since, some movies may have less number of reviews but a higher rating Vs other movies with higher number of reviews and higher rating, the analysis might get skewed. So, I want to calculate weighted average, for which I have defined a function.

# R = average for the movie (mean) = (Rating)
# v = number of ratings/reviews for the movie = (votes)
# m = minimum reviews required to be listed in the Top 250 movie list
# C = the mean rating across the whole report

def weighted_rating(R, v, m, C): 
  return (v/(v+m))*R + (m/(v+m))*C

avg_rating= avg_rating.assign(wr = weighted_rating(mean, count, 500, mean(mean)))

When I run the above last line of code, I get an error: name 'mean' is not defined

The same error persists for column count: name 'count' is not defined

How can I solve this error so that my final output contains the columns movieId, year, count, mean, and wr?

You need to use the fields from your df :

avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

So the final code could be like:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import (division, absolute_import, print_function,
                        unicode_literals)

import pandas as pd
from statistics import mean


def weighted_rating(R, v, m, C):
    return (v / (v + m)) * R + (m / (v + m)) * C


def main():

    df = pd.read_csv('movies.csv')

    avg_rating = df.groupby(['movieId', 'year'])['rating'].agg(
        [('Count', 'size'), ('Mean', 'mean')]).sort_values(by='Mean', ascending=False)

    avg_rating = avg_rating.assign(
        wr=weighted_rating(avg_rating['Mean'], avg_rating['Count'], 500, mean(avg_rating['Mean'])))

    print(avg_rating)


if __name__ == '__main__':
    main()

Using the CSV:

movieId,title,rating,year
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3,1995
1,abc,5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,0.5,1995
1,abc,3.5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,4,1995
1,abc,3,1995
1,abc,4,1995
1,abc,3.5,1995
1,abc,3,1995
1,abc,4,1995
1,abc,5,1995
1,abc,4.5,1995
1,abc,5,1995
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,4,2000
2,xyz,3.5,2000
2,xyz,5,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,3,2000
2,xyz,3,2000
2,xyz,4,2000
2,xyz,2,2000
2,xyz,3.5,2000
2,xyz,1,2000
3,pqr,3,1997
3,pqr,2,1997
3,pqr,3.5,1997
3,pqr,3.5,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,3,1997
3,pqr,4.5,1997
3,pqr,3.5,1997
3,pqr,4,1997
3,pqr,1.5,1997
3,pqr,2,1997
3,pqr,2,1997
3,pqr,2.5,1997
4,def,3,1999
4,def,2.5,1999
4,def,2.5,1999
4,def,0.5,1999
4,def,2,1999
4,def,3,1999
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,3,2006
5,movie12,2,2006
5,movie12,2.5,2006
5,movie12,3,2006
5,movie12,4,2006
5,movie12,0.5,2006
5,movie12,1,2006
5,movie12,3.5,2006
5,movie12,2,2006
5,movie12,3,2006
5,movie12,1.5,2006
5,movie12,2.5,2006
5,movie12,4,2006
5,movie12,4,2006
5,movie12,3.5,2006
5,movie12,3,2006
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,3.5,2012
6,movie45,4,2012
6,movie45,5,2012
6,movie45,2.5,2012
6,movie45,4,2012
6,movie45,4,2012
6,movie45,3.5,2012
6,movie45,5,2012
6,movie45,2,2012
6,movie45,4,2012

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM