I have following dataframe in pandas
date prod hourly_bucket tank trans flag
01-01-2019 TP 05:00:00-06:00:00 2 Preset Peak
01-01-2019 TP 05:00:00-06:00:00 2 Preset Peak
01-01-2019 TP 05:00:00-06:00:00 2 Non Preset Peak
02-01-2019 TP 05:00:00-06:00:00 2 Preset Lean
02-01-2019 TP 05:00:00-06:00:00 2 Preset Lean
02-01-2019 TP 05:00:00-06:00:00 2 Non Preset Lean
My Desired dataframe would be aggregation at day level and tank level and then taking a count of how many Preset,Non-Preset
transactions in Lean and Peak
hours
date tank Lean_Non_Preset Lean_Preset Peak_Non_Preset Peak_Preset
01-01-2019 2 1 2 1 2
I am doing following in pandas
lean_peak_preset_cnt = df.pivot_table(index=['date','tank'], columns=['flag'],values=['trans'],aggfunc='count').reset_index()
But it does not give me the required solution
Add 'trans'
to parameter columns
and then flatten MultiIndex
in columns with map
and join
:
lean_peak_preset_cnt = df.pivot_table(index=['date','tank'],
columns=['flag','trans'],
aggfunc='size',
fill_value=0)
lean_peak_preset_cnt.columns = lean_peak_preset_cnt.columns.map('_'.join)
lean_peak_preset_cnt = lean_peak_preset_cnt.reset_index()
print (lean_peak_preset_cnt)
date tank Lean_No Preset Lean_Preset Peak_Non Preset Peak_Preset
0 01-01-2019 2 0 0 1 2
1 02-01-2019 2 1 2 0 0
You were almost there:
piv = (df.pivot_table(index=['date', 'tank'], columns=['trans', 'flag'],
aggfunc='size', fill_value=0))
piv.columns = piv.columns.ravel()
The size function gives the counts you want, you would want to fill non-counted values with 0, and specify the columns and index you want. See docs for more details. The ravel
combines your multiindex columns to one level.
(Nonpreset, Lean) (Nonpreset, Peak) (Preset, Lean) \
#date tank
#01-01-2019 2 0 1 0
#02-01-2019 2 1 0 2
(Preset, Peak)
#date tank
#01-01-2019 2 2
#02-01-2019 2 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.