[英]Pandas pivot table shape without aggregation
I would like to understand if I can shape a DataFrame to a multi-index and multi-header/multi-column (pivot) DataFrame without aggregation since this aggregation calculation is already present on the columns of my DataFrame.我想了解是否可以将 DataFrame 塑造为多索引和多标题/多列(枢轴)DataFrame 而无需聚合,因为此聚合计算已经存在于我的 ZBA834BA05217A378E4Z1C 的列中。
I have the following DataFrame:我有以下 DataFrame:
card_type payment_status airbnb paid revenue - sum revenue - min debit - sum
American Express Checked Out Premium Queen Ensuite No 591.49 0.0 2
American Express Checked Out Queen Room w. Shared Facilities No 255.52 0.0 2
American Express Checked Out Single Room w. Shared Facilities No 1602.02 0.0 5
American Express Confirmed Compact Double Room w. Shared Facilities No 189.05 0.0 1
American Express Confirmed Premium Queen Ensuite No 350.0 0.0 1
American Express Confirmed Queen Room w. Shared Facilities Yes 110.53 0.0 1
American Express Confirmed Single Room w. Shared Facilities No 4258.48 0.0 3
Mastercard Cancelled Queen Room w. Shared Facilities No 28.5 0.0 3
Mastercard Cancelled Single Room w. Shared Facilities Yes 578.55 0.0 2
Mastercard Checked Out Compact Double Room w. Shared Facilities No 4637.71 0.0 22
...
df = pd.DataFrame.from_dict({
'card_type': {0: 'American Express', 1: 'American Express', 2: 'American Express', 3: 'American Express', 4: 'American Express', 5: 'American Express', 6: 'American Express', 7: 'Mastercard', 8: 'Mastercard', 9: 'Mastercard'},
'payment_status': {0: 'Checked Out', 1: 'Checked Out', 2: 'Checked Out', 3: 'Confirmed', 4: 'Confirmed', 5: 'Confirmed', 6: 'Confirmed', 7: 'Cancelled', 8: 'Cancelled', 9: 'Checked Out'},
'airbnb': {0: 'Premium Queen Ensuite ', 1: 'Queen Room w. Shared Facilities ', 2: 'Single Room w. Shared Facilities ', 3: 'Compact Double Room w. Shared Facilities ', 4: 'Premium Queen Ensuite ', 5: 'Queen Room w. Shared Facilities ', 6: 'Single Room w. Shared Facilities ', 7: 'Queen Room w. Shared Facilities ', 8: 'Single Room w. Shared Facilities ', 9: 'Compact Double Room w. Shared Facilities '},
'paid': {0: 'No', 1: 'No', 2: 'No', 3: 'No', 4: 'No', 5: 'Yes', 6: 'No', 7: 'No', 8: 'Yes', 9: 'No'},
'revenue - sum': {0: 591.49, 1: 255.52, 2: 1602.02, 3: 189.05, 4: 350.0, 5: 110.53, 6: 4258.48,7: 28.5, 8: 578.55, 9: 4637.71},
'revenue - min': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
'debit - sum': {0: 2, 1: 2, 2: 5, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 22}})
I have used this method (based on Pandas Pivot table without aggregating ) to achieve (partially) the shape I'm looking.我已经使用这种方法(基于Pandas Pivot table without agregating )来实现(部分)我正在寻找的形状。 However, I would like to swap the aggfuncs label to the bottom (probably with https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ) and it doesn't feel right because my values are already previously calculated and we don't need to be calculated again:
However, I would like to swap the aggfuncs label to the bottom (probably with https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ) and it doesn't feel right because my values之前已经计算过了,我们不需要再次计算:
df.pivot_table(index=["card_type", "payment_status"], columns=["airbnb", "paid"], values=["revenue - sum", "revenue - min", "debit - sum"], aggfunc={"revenue - sum": ["sum"], "revenue - min": ["max"], "debit - sum": ["mean"]}, fill_value="-")
What I expect to achieve is a DataFrame similar to this:我期望实现的是与此类似的 DataFrame:
Any way I can get around with this?有什么办法可以解决这个问题吗? Thanks!
谢谢!
If you have already computed your values, you can use either:如果你已经计算了你的值,你可以使用:
pivot_table
with aggfunc='first'
and fill_value='_'
pivot_table
aggfunc='first'
和fill_value='_'
pivot
and fillna('-')
pivot
和fillna('-')
For your column levels, use reorder_levels
instead of swaplevel
to rearrange colimns levels using input order: levels [0, 1, 2] to [1, 2, 0]:对于您的列级别,使用
reorder_levels
而不是swaplevel
使用输入顺序重新排列列级别:级别 [0, 1, 2] 到 [1, 2, 0]:
out = df.pivot(index=["card_type", "payment_status"],
columns=["airbnb", "paid"],
values=["revenue - sum", "revenue - min", "debit - sum"]) \
.fillna('-').reorder_levels([1, 2, 0], axis=1)
Output: Output:
>>> out
airbnb Premium Queen Ensuite Queen Room w. Shared Facilities Single Room w. Shared Facilities ... Compact Double Room w. Shared Facilities Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No No No ... No Yes Yes
revenue - sum revenue - sum revenue - sum ... debit - sum debit - sum debit - sum
card_type payment_status ...
American Express Checked Out 591.49 255.52 1602.02 ... - - -
Confirmed 350.0 - 4258.48 ... 1.0 1.0 -
Mastercard Cancelled - 28.5 - ... - - 2.0
Checked Out - - - ... 22.0 - -
Update更新
I would like to create one more level which results from the split of values by: "-"
我想通过以下方式再创建一个由值拆分产生的级别:“-”
As you have to break some columns names into two parts, use a different strategy.由于您必须将某些列名称分成两部分,因此请使用不同的策略。 First, move some columns as index of your dataframe then explode your remain columns names into multi level.
首先,移动一些列作为 dataframe 的索引,然后将剩余的列名称分解为多级。 Finally, unstack your
airbnb
and paid
index levels then rearrange the order of your column levels:最后,取消堆叠您的
airbnb
和paid
索引级别,然后重新排列您的列级别的顺序:
out = df.set_index(['card_type', 'payment_status', 'airbnb', 'paid'])
out.columns = out.columns.str.split(' - ').map(tuple)
out = out.unstack(['airbnb', 'paid'], fill_value='-') \
.reorder_levels([2, 3, 0, 1], axis=1)
Output: Output:
>>> out
airbnb Compact Double Room w. Shared Facilities Premium Queen Ensuite ... Queen Room w. Shared Facilities Single Room w. Shared Facilities
paid No Yes No ... Yes No Yes
revenue revenue revenue ... debit debit debit
sum sum sum ... sum sum sum
card_type payment_status ...
American Express Checked Out - - 591.49 ... - 5 -
Confirmed 189.05 - 350.0 ... 1 3 -
Mastercard Cancelled - - - ... - - 2
Checked Out 4637.71 - - ... - - -
[4 rows x 24 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.