I have a pandas dataframe that looks like this
genre1 genre2 genre3 Votes1 votes2 votes3 ......… cnt
Comedy Animation Drama 8.3 7.0 8.5 1
Adventure Comedy Mystery 6.4 8.2 3.5 1
Drama Music Sci-Fi 3.8 6.2 5.9 1
.
.
.
I want to create 3 new data frames using group by of individual genres and sum of all the other numerical columns seperately for each dataframe. I have tried different variations of groupby, sum of pandas but I am unable to figure out how to apply groupby sum all together to give the result as shown. Please share any ideas that you might have. Thanks!
When you do df.groupby().sum()
you will get a DataFrame with a column for each column summed over, and the index will be the different groups.
Additionally, you can pass a list of columns names to groupby()
. So you could do: df.groupby(["genre1", "genre2", "genre3"])
Examples:
>>> df = pd.DataFrame(
{
"hello": ["world", "brave", "world", "brave",],
"num1": [1, 2, 3, 4],
"num2": [1, 2, 3, 4]
}
)
>>> df
hello num1 num2
0 world 1 1
1 brave 2 2
2 world 3 3
3 brave 4 4
>>> df.groupby("hello").sum()
num1 num2
hello
brave 6 6
world 4 4
>>> df.groupby("hello").sum().columns
Index(['num1', 'num2'], dtype='object')
>>> df.groupby("hello").sum().index
Index(['brave', 'world'], dtype='object', name='hello')
>>> df = pd.DataFrame(
{
"hello1": ["world", "brave", "world", "brave",],
"hello2": ["new", "world", "brave", "new",],
"num1": [1, 2, 3, 4],
"num2": [1, 2, 3, 4]
}
)
>>> df.groupby(["hello1", "hello2"]).sum()
num1 num2
hello1 hello2
brave new 4 4
world 2 2
world brave 3 3
new 1 1
That should give you the result you are looking for, but if you want multiple DataFrames, you may have to copy the data from the output DataFrame into new DataFrames for each column that you want in its own DataFrame.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.