Reshaping Data Frame while keeping categorical variables

Question

I've been trying to use a pivot table to reshape this dataframe that has the following shape.

User  Product  Gender  Age  Cost

1     1        M       25   10
1     2        M       25   12
1     3        M       25   14
1     4        M       25   15
2     2        F       19   29
2     4        F       19   14
2     6        F       19   17
2     8        F       19   30

I want it to look like this:

User  Gender  Age   Cost

1     M        25   51   
2     F        19   90

In other words, I want to sum the cost by UserID while retaining the rest of the categorical variables in the dataframe.

I've tried pivoting the data but it drops the gender and age variables which I want to keep.

I've tried using a groupby function and summing the cost column but when I try to add in the gender and age variables it either produces NaNs or recreates the original table with multiple entries for the same user.

The gender and age variables are consistent across users. What am I missing?

Answer 1

You need groupby + agg :

df.groupby(['User','Gender']).agg({'Age':'first','Cost':'sum'}).reset_index()

   User Gender  Age  Cost
0     1      M   25    51
1     2      F   19    90

Or:

df.groupby(['User','Gender'], as_index=False).agg({'Age':'first','Cost':'sum'})

Reshaping Data Frame while keeping categorical variables

Question

1 answers

solution1
1 ACCPTED 2018-12-10 05:30:54

Reshaping Data Frame while keeping categorical variables

Question

1 answers

solution1 1 ACCPTED 2018-12-10 05:30:54

solution1
1 ACCPTED 2018-12-10 05:30:54