I have this dataframe ( https://i.stack.imgur.com/hRD1H.jpg ) that I created from an SQL query. I want to create a bar graph that shows the frequency for each type of movie genre, so I can see what the top genre is.
My problem is that in the genre column, each value is compromised of multiple genres. But I want to separate each into its own genre. So say I have a movie whose genre is "Action, Thriller". I want to be able to count those as two separate entries.
I have been trying to work on this for days, but for the life of me I cannot figure out syntax to be able to do this. Should I do the actual separating in my SQL query, or should I do it when working with the DF? Any help would be greatly appreciated.
I haven't seen SQL in a long time, so I can't say about it.
But in python, I would do something like this:
def count_genre(genre_array):
genre_array_sep = []
counts = []
for g in genre_array:
genre_array_sep.append(g.split(", "))
# print(genre_array_sep)
options = ["Thriller", "Drama", "Action"]
for op in options:
count = 0
for g in genre_array_sep:
if op in g:
count += 1
g.remove(op)
# print(genre_array_sep)
counts.append(count)
return counts
# input
film_genre = ["Thriller", "Drama", "Action, Thriller", "Action", "Action"]
# output
print(count_genre(film_genre))
But please bear in mind that I'm not a programmer so there is certainly a better/faster solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.