简体   繁体   中英

Reshaping Data Frame while keeping categorical variables

I've been trying to use a pivot table to reshape this dataframe that has the following shape.

User  Product  Gender  Age  Cost

1     1        M       25   10
1     2        M       25   12
1     3        M       25   14
1     4        M       25   15
2     2        F       19   29
2     4        F       19   14
2     6        F       19   17
2     8        F       19   30

I want it to look like this:

User  Gender  Age   Cost

1     M        25   51   
2     F        19   90   

In other words, I want to sum the cost by UserID while retaining the rest of the categorical variables in the dataframe.

I've tried pivoting the data but it drops the gender and age variables which I want to keep.

I've tried using a groupby function and summing the cost column but when I try to add in the gender and age variables it either produces NaNs or recreates the original table with multiple entries for the same user.

The gender and age variables are consistent across users. What am I missing?

You need groupby + agg :

df.groupby(['User','Gender']).agg({'Age':'first','Cost':'sum'}).reset_index()

   User Gender  Age  Cost
0     1      M   25    51
1     2      F   19    90

Or:

df.groupby(['User','Gender'], as_index=False).agg({'Age':'first','Cost':'sum'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM