I would like to understand if I can shape a DataFrame to a multi-index and multi-header/multi-column (pivot) DataFrame without aggregation since this aggregation calculation is already present on the columns of my DataFrame.
I have the following DataFrame:
card_type payment_status airbnb paid revenue - sum revenue - min debit - sum
American Express Checked Out Premium Queen Ensuite No 591.49 0.0 2
American Express Checked Out Queen Room w. Shared Facilities No 255.52 0.0 2
American Express Checked Out Single Room w. Shared Facilities No 1602.02 0.0 5
American Express Confirmed Compact Double Room w. Shared Facilities No 189.05 0.0 1
American Express Confirmed Premium Queen Ensuite No 350.0 0.0 1
American Express Confirmed Queen Room w. Shared Facilities Yes 110.53 0.0 1
American Express Confirmed Single Room w. Shared Facilities No 4258.48 0.0 3
Mastercard Cancelled Queen Room w. Shared Facilities No 28.5 0.0 3
Mastercard Cancelled Single Room w. Shared Facilities Yes 578.55 0.0 2
Mastercard Checked Out Compact Double Room w. Shared Facilities No 4637.71 0.0 22
...
df = pd.DataFrame.from_dict({
'card_type': {0: 'American Express', 1: 'American Express', 2: 'American Express', 3: 'American Express', 4: 'American Express', 5: 'American Express', 6: 'American Express', 7: 'Mastercard', 8: 'Mastercard', 9: 'Mastercard'},
'payment_status': {0: 'Checked Out', 1: 'Checked Out', 2: 'Checked Out', 3: 'Confirmed', 4: 'Confirmed', 5: 'Confirmed', 6: 'Confirmed', 7: 'Cancelled', 8: 'Cancelled', 9: 'Checked Out'},
'airbnb': {0: 'Premium Queen Ensuite ', 1: 'Queen Room w. Shared Facilities ', 2: 'Single Room w. Shared Facilities ', 3: 'Compact Double Room w. Shared Facilities ', 4: 'Premium Queen Ensuite ', 5: 'Queen Room w. Shared Facilities ', 6: 'Single Room w. Shared Facilities ', 7: 'Queen Room w. Shared Facilities ', 8: 'Single Room w. Shared Facilities ', 9: 'Compact Double Room w. Shared Facilities '},
'paid': {0: 'No', 1: 'No', 2: 'No', 3: 'No', 4: 'No', 5: 'Yes', 6: 'No', 7: 'No', 8: 'Yes', 9: 'No'},
'revenue - sum': {0: 591.49, 1: 255.52, 2: 1602.02, 3: 189.05, 4: 350.0, 5: 110.53, 6: 4258.48,7: 28.5, 8: 578.55, 9: 4637.71},
'revenue - min': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'debit - sum': {0: 2, 1: 2, 2: 5, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 22}})
I have used this method (based on Pandas Pivot table without aggregating ) to achieve (partially) the shape I'm looking. However, I would like to swap the aggfuncs label to the bottom (probably with https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ) and it doesn't feel right because my values are already previously calculated and we don't need to be calculated again:
df.pivot_table(index=["card_type", "payment_status"], columns=["airbnb", "paid"], values=["revenue - sum", "revenue - min", "debit - sum"], aggfunc={"revenue - sum": ["sum"], "revenue - min": ["max"], "debit - sum": ["mean"]}, fill_value="-")
What I expect to achieve is a DataFrame similar to this:
Any way I can get around with this? Thanks!
If you have already computed your values, you can use either:
pivot_table
with aggfunc='first'
and fill_value='_'
pivot
and fillna('-')
For your column levels, use reorder_levels
instead of swaplevel
to rearrange colimns levels using input order: levels [0, 1, 2] to [1, 2, 0]:
out = df.pivot(index=["card_type", "payment_status"],
columns=["airbnb", "paid"],
values=["revenue - sum", "revenue - min", "debit - sum"]) \
.fillna('-').reorder_levels([1, 2, 0], axis=1)
Output:
>>> out
airbnb Premium Queen Ensuite Queen Room w. Shared Facilities Single Room w. Shared Facilities ... Compact Double Room w. Shared Facilities Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No No No ... No Yes Yes
revenue - sum revenue - sum revenue - sum ... debit - sum debit - sum debit - sum
card_type payment_status ...
American Express Checked Out 591.49 255.52 1602.02 ... - - -
Confirmed 350.0 - 4258.48 ... 1.0 1.0 -
Mastercard Cancelled - 28.5 - ... - - 2.0
Checked Out - - - ... 22.0 - -
Update
I would like to create one more level which results from the split of values by: "-"
As you have to break some columns names into two parts, use a different strategy. First, move some columns as index of your dataframe then explode your remain columns names into multi level. Finally, unstack your airbnb
and paid
index levels then rearrange the order of your column levels:
out = df.set_index(['card_type', 'payment_status', 'airbnb', 'paid'])
out.columns = out.columns.str.split(' - ').map(tuple)
out = out.unstack(['airbnb', 'paid'], fill_value='-') \
.reorder_levels([2, 3, 0, 1], axis=1)
Output:
>>> out
airbnb Compact Double Room w. Shared Facilities Premium Queen Ensuite ... Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No Yes No ... Yes No Yes
revenue revenue revenue ... debit debit debit
sum sum sum ... sum sum sum
card_type payment_status ...
American Express Checked Out - - 591.49 ... - 5 -
Confirmed 189.05 - 350.0 ... 1 3 -
Mastercard Cancelled - - - ... - - 2
Checked Out 4637.71 - - ... - - -
[4 rows x 24 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.