简体   繁体   English

如何在熊猫中的数据透视表中聚合

[英]how to aggregate in pivot table in pandas

I have following dataframe in pandas 我在熊猫中有以下数据框

   code     date         tank      nozzle       qty      amount
   123      2018-01-01   1         1            100      0
   123      2018-01-01   1         2            0        50
   123      2018-01-01   1         2            0        50
   123      2018-01-01   1         2            100      0 
   123      2018-01-02   1         1            0        70
   123      2018-01-02   1         1            0        50
   123      2018-01-02   1         2            100      0

My desired dataframe is 我想要的数据框是

code   date       tank     nozzle_1_qty   nozzle_2_qty  nozzle_1_amount   nozzle_2_amount
123   2018-01-01  1        100             100          0                 100
123   2018-01-02  1        0               100          120               0 

I am doing following in pandas.. 我正在熊猫里追随。

df= (df.pivot_table(index=['date', 'tank'], columns='nozzle',
                     values=['qty','amount']).add_prefix('nozzle_')
         .reset_index()
      )

But,this does not give me my desired output. 但是,这没有给我我想要的输出。

Default aggregation function in pivot_table is np.mean , so is necessary change it to sum and then flatten MultiIndex in list comprehension: pivot_table默认聚合函数为np.mean ,因此有必要将其更改为sum ,然后在列表理解中展平MultiIndex

df = df.pivot_table(index=['code','date', 'tank'], 
                    columns='nozzle', 
                    values=['qty','amount'], aggfunc='sum')
#python 3.6+
df.columns = [f'nozzle_{b}_{a}' for a, b in df.columns]
#python bellow
#df.columns = ['nozzle_{}_{}'.format(b,a) for a, b in df.columns]
df = df.reset_index()
print (df)
   code        date  tank  nozzle_1_amount  nozzle_2_amount  nozzle_1_qty  \
0   123  2018-01-01     1                0              100           100   
1   123  2018-01-02     1              120                0             0   

   nozzle_2_qty  
0           100  
1           100  

I don't use pivot_table much in pandas, but you can get your result using groupby and some reshaping. 我在熊猫中使用的数据透视表很少,但是您可以使用groupby和一些重塑来获得结果。

df = df.groupby(['code', 'date', 'tank', 'nozzle']).sum().unstack()

The columns will be a MultiIndex that you maybe want to rename. 这些列将是一个您可能要重命名的MultiIndex。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM