简体   繁体   English

Pivot 表用 Pandas

[英]Pivot Tables with Pandas

I have the following data saved as a pandas dataframe我将以下数据保存为 pandas dataframe

Animal Day age Food  kg  
 1      1   3   17   0.1 
 1      1   3   22   0.7
 1      2   3   17   0.8
 2      2   7   15   0.1

With pivot I get the following:使用pivot我得到以下信息:

output = df.pivot(["Animal", "Food"], "Day", "kg") \
           .add_prefix("Day") \
           .reset_index() \
           .rename_axis(None, axis=1)

>>> output
   Animal  Food  Day1  Day2
0       1    17   0.1   0.8
1       1    22   0.7   NaN
2       2    15   NaN   0.1

However I would like to have the age column (and other columns) still included.但是我希望仍然包括年龄列(和其他列)。 It could also be possible that for animal x the value age is not always the same, then it doesn't matter which age value is taken.也有可能对于动物 x 的年龄值并不总是相同的,那么采用哪个年龄值并不重要。

    Animal  Food Age Day1  Day2
0       1    17   3   0.1   0.8
1       1    22   3   0.7   NaN
2       2    15   7   NaN   0.1

How do I need to change the code above?我需要如何更改上面的代码?

IIUC, what you want is to pivot the weight, but to aggregate the age. IIUC,你想要的是 pivot 的重量,而是聚合年龄。

To my knowledge, you need to do both operations separately.据我所知,您需要分别执行这两项操作。 One with pivot , the other with groupby (here I used first for the example, but this could be anything), and join :一个带有pivot ,另一个带有groupby (这里我first使用示例,但这可以是任何东西),然后join

(df.pivot(index=["Animal", "Food"],
          columns="Day",
          values="kg",
         )
   .add_prefix('Day')
   .join(df.groupby(['Animal', 'Food'])['age'].first())
   .reset_index()
 )

I am adding a non ambiguous example (here the age of Animal 1 changes on Day2).我正在添加一个明确的示例(此处动物 1 的年龄在第 2 天发生变化)。

Input:输入:

   Animal  Day  age  Food   kg
0       1    1    3    17  0.1
1       1    1    3    22  0.7
2       1    2    4    17  0.8
3       2    2    7    15  0.1

output: output:

   Animal  Food  Day1  Day2  age
0       1    17   0.1   0.8    3
1       1    22   0.7   NaN    3
2       2    15   NaN   0.1    7

Use pivot , add other columns to index:使用pivot ,将其他列添加到索引中:

>>> df.pivot(df.columns[~df.columns.isin(['Day', 'kg'])], 'Day', 'kg') \
      .add_prefix('Day').reset_index().rename_axis(columns=None)

   Animal  age  Food  Day1  Day2
0       1    3    17   0.1   0.8
1       1    3    22   0.7   NaN
2       2    7    15   NaN   0.1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM