I have a Dataframe that has a movie name column and 3 other columns (let's call them A, B, and C) that are ratings from 3 different sources. There are many movies with only one rating, some movies with a combination from the 3 forums, and some with no ratings. I want to create a new column that will:
This is what I have in my code so far:
def check_rating(rating):
if newyear['Yahoo Rating'] != "\\N":
return rating
else:
if newyear['Movie Mom Rating'] != "\\N":
return rating
else:
if newyear['Critc Rating'] != "\\N":
return rating
else:
return "Unrated"
df['Rating'] = df.apply(check_rating, axis=1)
The error I get is:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
For visual of my dataframe, here is newyear.head()
:
I am not sure what this value error means to fix this problem and also if this is the right way to do it.
I would do something like this:
df = df.replace('\\N', np.nan) # this requires import numpy as np
(df['Yahoo Rating'].fillna(df['Movie Mom Rating']
.fillna(df['Critic Rating']
.fillna("Unrated"))))
The reason that your code doesn't work is that newyear['Yahoo Rating'] != "\\\\N"
is a boolean array. What you say here is something like if [True, False, True, False]:
. That's the source of ambiguity. How do you evaluate such a condition? Would you execute if all of them True or would just one of them be enough?
As M. Klugerford explained , you can change it so it is evaluated row by row (therefore returns a single value). However, row by row apply operations are generally slow and pandas has great tools for handling missing data. That's why I am suggesting this.
You are returning rating
in your original function .. but rating
is the row , not the value of any column
>>> df
A B C Genre Title Year
0 7 6 \N g1 m1 y1
1 \N 5 7 g2 m2 y2
2 \N \N \N g3 m3 y3
3 \N 4 1 g4 m4 y4
4 \N \N 3 g5 m5 y5
>>> def rating(row):
if row['A'] != r'\N':
return row['A']
if row['B'] != r'\N':
return row['B']
if row['C'] != r'\N':
return row['C']
return 'Unrated'
>>> df['Rating'] = df.apply(rating, axis = 1)
>>> df
A B C Genre Title Year Rating
0 7 6 \N g1 m1 y1 7
1 \N 5 7 g2 m2 y2 5
2 \N \N \N g3 m3 y3 Unrated
3 \N 4 1 g4 m4 y4 4
4 \N \N 3 g5 m5 y5 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.