I'm trying to count the number of times something comes up in one column, and group it by another. For example, I have the following:
import pandas as pd
import numpy as np
import matplotlib as plt
df = pd.read_csv("C:\\Users\\user1\\Desktop\\genre_testing.csv")
This gives me the following data example set:
What I would like to be able to do later is count the number of "Adventure" shows/movies, and have both mandalorian and zombieland counted. I believe the first issue is that both columns are stored as objects
but I may need them as arrays?
Using something like df.groupby('genre')['show_name'].nunique()
provides the full object rather than the elements, which is what I'm looking for. Any advice on where to start? Thanks!
There's already a thing for this which should be pretty easy to use.
df_coded = df['genre'].str.get_dummies(sep=",")
df_coded['show_name'] = df['show_name']
As already mentionned, you can use .str.split(',')
to get the genre as a list, but to further that response, once you have split
you can explode
your dataframe to have a dataframe more suited for filtering, counting, ...
>>> data = pandas.DataFrame(data=[["mandalorian", "Adventure,Action,Sci-Fi"], ["zombieland", "Comedy,Adventure,Action"]], columns=["show_name", "genre"])
>>> data
show_name genre
0 mandalorian Adventure,Action,Sci-Fi
1 zombieland Comedy,Adventure,Action
>>> data['genre'] = data['genre'].str.split(',')
>>> data
show_name genre
0 mandalorian [Adventure, Action, Sci-Fi]
1 zombieland [Comedy, Adventure, Action]
>>> data = data.explode('genre')
>>> data
show_name genre
0 mandalorian Adventure
0 mandalorian Action
0 mandalorian Sci-Fi
1 zombieland Comedy
1 zombieland Adventure
1 zombieland Action
>>> data[data['genre'] == 'Adventure']['show_name']
0 mandalorian
1 zombieland
>>> data.groupby('genre')['show_name'].nunique()
genre
Action 2
Adventure 2
Comedy 1
Sci-Fi 1
Name: show_name, dtype: int64
Here is an alternative that might put you on the way. Assume your df
is defined this way
d = {'Show':["Zombieland","Madalorian","Star Wars","Spiderman"],'genre':["Adventure,SciFi", "Adventure,SciFi,Action","SciFi,Action","Comedy"]}
df = pd.DataFrame(d)
Which gives you
Show genre
0 Zombieland Adventure,SciFi
1 Madalorian Adventure,SciFi,Action
2 Star Wars SciFi,Action
3 Spiderman Comedy
What you wish is to subset this df
by choosing only those rows for which the genre
column contains, say Action
. You can do this this way:
df2 =df[df.genre.astype(str).str.contains('Action')]
which gives
Show genre
1 Madalorian Adventure,SciFi,Action
2 Star Wars SciFi,Action
You can then do subsetting on that or simply do a row count count_row = df2.shape[0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.