简体   繁体   English

重塑数据框架,同时保留分类变量

[英]Reshaping Data Frame while keeping categorical variables

I've been trying to use a pivot table to reshape this dataframe that has the following shape. 我一直在尝试使用数据透视表来重塑具有以下形状的数据框。

User  Product  Gender  Age  Cost

1     1        M       25   10
1     2        M       25   12
1     3        M       25   14
1     4        M       25   15
2     2        F       19   29
2     4        F       19   14
2     6        F       19   17
2     8        F       19   30

I want it to look like this: 我希望它看起来像这样:

User  Gender  Age   Cost

1     M        25   51   
2     F        19   90   

In other words, I want to sum the cost by UserID while retaining the rest of the categorical variables in the dataframe. 换句话说,我想通过UserID来汇总成本,同时将其余的分类变量保留在数据框中。

I've tried pivoting the data but it drops the gender and age variables which I want to keep. 我尝试过透视数据,但是它删除了我想要保留的性别和年龄变量。

I've tried using a groupby function and summing the cost column but when I try to add in the gender and age variables it either produces NaNs or recreates the original table with multiple entries for the same user. 我尝试过使用groupby函数并汇总费用列,但是当我尝试添加性别和年龄变量时,它会产生NaN或为同一用户重新创建带有多个条目的原始表。

The gender and age variables are consistent across users. 性别和年龄变量在用户之间是一致的。 What am I missing? 我想念什么?

You need groupby + agg : 您需要groupby + agg

df.groupby(['User','Gender']).agg({'Age':'first','Cost':'sum'}).reset_index()

   User Gender  Age  Cost
0     1      M   25    51
1     2      F   19    90

Or: 要么:

df.groupby(['User','Gender'], as_index=False).agg({'Age':'first','Cost':'sum'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM