I have a data frame with the columns Movie Title
and Cast
that looks like this:
Column 1 has the name of the movie, whilst Column 2 lists the full cast of the film. The cast has been taken from the site TMDB.
Column 2 has the pattern: 'cast_id': {cast_id_number}
, 'character': {character_name}
, 'credit_id': {credit_number}
, 'gender': {gender_identifier}
, etc.
I am writing a project for school looking at the gender split in different films. I therefore want to create a column that counts the number of male/female actors in a specific film. eg:
Movie Title | Cast | No. of Males | No. of Females
Toy Story | .... | 3 | 7
However, I'm not sure how to go about doing this. I've tried using str.count
but it keeps returning all values as 0, even if I can see a cell contains 'gender': 2
or 'gender': 1
.
I'm assuming it may need an if loop counter that reads the string in each row and adds 1 every time it encounters 'gender': 2
but have no idea how to implement this.
You will need to iterate over each cast member for each movie and determine how many cast members are female/male. Something like this should work:
def gender_ct(data, gender=1):
return len([1 for x in data if x['gender'] == gender])
df['No. of Females'] = df['Cast'].apply(gender_ct, gender=1)
df['No. of Males'] = df['Cast'].apply(gender_ct, gender=2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.