Pandas Dataframe groupby one column and sum of all other columns

Question

I have a pandas dataframe that looks like this

genre1    genre2    genre3   Votes1  votes2  votes3 ......… cnt
Comedy    Animation Drama    8.3     7.0     8.5            1
Adventure Comedy    Mystery  6.4     8.2     3.5            1
Drama     Music     Sci-Fi   3.8     6.2     5.9            1
.
.
.

I want to create 3 new data frames using group by of individual genres and sum of all the other numerical columns seperately for each dataframe. I have tried different variations of groupby, sum of pandas but I am unable to figure out how to apply groupby sum all together to give the result as shown. Please share any ideas that you might have. Thanks!

Answer 1

When you do df.groupby().sum() you will get a DataFrame with a column for each column summed over, and the index will be the different groups.

Additionally, you can pass a list of columns names to groupby() . So you could do: df.groupby(["genre1", "genre2", "genre3"])

Examples:

>>> df = pd.DataFrame(
    {
        "hello": ["world", "brave", "world", "brave",], 
        "num1": [1, 2, 3, 4], 
        "num2": [1, 2, 3, 4]
    }
)
>>> df
   hello  num1  num2
0  world     1     1
1  brave     2     2
2  world     3     3
3  brave     4     4
>>> df.groupby("hello").sum()
       num1  num2
hello
brave     6     6
world     4     4
>>> df.groupby("hello").sum().columns
Index(['num1', 'num2'], dtype='object')
>>> df.groupby("hello").sum().index
Index(['brave', 'world'], dtype='object', name='hello')

>>> df = pd.DataFrame(
    {
        "hello1": ["world", "brave", "world", "brave",], 
        "hello2": ["new", "world", "brave", "new",], 
        "num1": [1, 2, 3, 4], 
        "num2": [1, 2, 3, 4]
    }
)
>>> df.groupby(["hello1", "hello2"]).sum()
               num1  num2
hello1 hello2
brave  new        4     4
       world      2     2
world  brave      3     3
       new        1     1

That should give you the result you are looking for, but if you want multiple DataFrames, you may have to copy the data from the output DataFrame into new DataFrames for each column that you want in its own DataFrame.

Pandas Dataframe groupby one column and sum of all other columns

Question

1 answers

solution1
0 2020-06-15 14:20:43

Pandas Dataframe groupby one column and sum of all other columns

Question

1 answers

solution1 0 2020-06-15 14:20:43

solution1
0 2020-06-15 14:20:43