Say I have a pivoted dataframe of the form
Value Qty Code
Color Blue Green Red Blue Green Red Blue Green Red
Date
2017-07-01 0.0 1.1 0.0 0.0 12.0 0.0 0 abc 0
2017-07-03 2.3 1.3 0.0 3.0 1.0 0.0 cde abc 0
2017-07-06 0.0 0.0 1.4 0.0 0.0 1.0 0 0 cde
I am interested in resampling the Date into weekly frequency. I would like to perform the following transformation on each of the sub-columns of the major column, Value: max, Qty: sum, Code = last. In a normal non-MultiIndex dataframe, df, one would do the following via the agg() function.
df.resample('W').agg({"Value":"max", "Qty":"sum", "Code":"last"})
But when I try it with the pivoted dataframe, it doesn't like the keys. How would I do it in the case of multi-index dataframe without explicitly specifying all the sub-columns?
The expected output is
Value Qty Code
Color Blue Green Red Blue Green Red Blue Green Red
Date
2017-07-02 0.0 1.1 0.0 0.0 12.0 0.0 0 abc 0
2017-07-09 2.3 1.3 1.4 3.0 1.0 1.0 0 0 cde
To generate the above starting dataframe, use the following code
from collections import OrderedDict
import pandas as pd
table = OrderedDict((
("Date", ["2017-07-01", "2017-07-03", "2017-07-03", "2017-07-6"]),
('Color',['Green', 'Blue', 'Green', 'Red']),
('Value', [1.1, 2.3, 1.3, 1.4]),
('Qty', [12, 3, 1, 1]),
('Code', ['abc', 'cde', 'abc', 'cde'])
))
d = pd.DataFrame(table)
p = d.pivot(index='Date', columns='Color')
p.index = pd.to_datetime(p.index)
p.fillna(0, inplace=True)
EDIT: Added desired result.
EDIT 2: I have also tried to create a dictionary to feed into the agg() function but it's coming out with 4 levels of column headers.
dc = dict(zip(p.columns, map({'Value': 'max', 'Qty': 'sum', 'Code': 'last'}.get, [x[0] for x in p.columns])))
newp = p.resample('W').agg(dc)
I believe you'll need to stack()
to avoid the MultiIndex
. There doesn't seem to be a way to specify level=0
in the agg
method of a groupby
or resample
object so this was the only way I could figure it out (let me know if this isn't accurate):
p.stack().reset_index(level=1).groupby(pd.Grouper(freq='w')).agg({'Value': 'max', 'Qty': 'sum', 'Code': 'last'})
Qty Value Code
Date
2017-07-02 12.0 1.1 0
2017-07-09 5.0 2.3 code
Stack will bring the colors to the index
along axis 0, reset the index to convert MultiIndex
to DateTimeIndex
, the remainder is pretty straightforward.
EDIT
Does this work?
dic = {'Value': 'max', 'Qty': 'sum', 'Code': 'last'}
df = pd.DataFrame()
for i in p.columns.get_level_values(0).unique():
temp = p.xs(i, axis=1, level=0, drop_level=False).resample('W').agg(dic[i])
df = pd.concat([df, temp], axis=1)
df.columns=p.columns
df
Value Qty Code
Color Blue Green Red Blue Green Red Blue Green Red
Date
2017-07-02 0.0 1.1 0.0 0.0 12.0 0.0 0 abc 0
2017-07-09 2.3 1.3 1.4 3.0 1.0 1.0 0 0 cde
I don't know how "fail proof" this method is so use caution. Setting df.columns=p.columns
seems sketchy but keeping the multiindex has been the major challenge. If I set levels=p.columns.levels
in pd.concat()
(which seems safer) it flattens the index to tuples which could also be unpacked into a multiindex. I've tested this a few different ways and it seems to be fine.
Consider first combining the hierarchical columns and running weekly aggregates by the different column types: Value , Qty , and Code .
# COMBINE THE LIST OF MULTI-LEVEL COLUMN (LIST OF TUPLES)
p.columns = [i[0]+i[1] for i in p.columns]
p.columns = p.columns.get_level_values(0)
# HORIZONTAL MERGE
out = pd.concat([p.resample('W').max()[[c for c in p.columns if 'Value' in c]],
p.resample('W').sum()[[c for c in p.columns if 'Qty' in c]],
p.resample('W').last()[[c for c in p.columns if 'Code' in c]]], axis=1)
print(out)
# ValueBlue ValueGreen ValueRed QtyBlue QtyGreen QtyRed CodeBlue CodeGreen CodeRed
# Date
# 2017-07-02 0.0 1.1 0.0 0.0 12.0 0.0 0 abc 0
# 2017-07-09 2.3 1.3 1.4 3.0 1.0 1.0 0 0 cde
To retain original hierarchical columns, save the column object before flattening the column levels and then re-assign back to columns after the resampling process:
pvtcolumns = p.columns
# ...same code as above
out.columns = pvtcolumns
print(df)
# Value Qty Code
# Color Blue Green Red Blue Green Red Blue Green Red
# Date
# 2017-07-02 0.0 1.1 0.0 0.0 12.0 0.0 0 abc 0
# 2017-07-09 2.3 1.3 1.4 3.0 1.0 1.0 0 0 cde
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.