简体   繁体   中英

Pandas pivot table shape without aggregation

I would like to understand if I can shape a DataFrame to a multi-index and multi-header/multi-column (pivot) DataFrame without aggregation since this aggregation calculation is already present on the columns of my DataFrame.

I have the following DataFrame:

card_type           payment_status  airbnb                                     paid revenue - sum   revenue - min   debit - sum
American Express    Checked Out     Premium Queen Ensuite                      No   591.49          0.0             2
American Express    Checked Out     Queen Room w. Shared Facilities            No   255.52          0.0             2
American Express    Checked Out     Single Room w. Shared Facilities           No   1602.02         0.0             5
American Express    Confirmed       Compact Double Room w. Shared Facilities   No   189.05          0.0             1
American Express    Confirmed       Premium Queen Ensuite                      No   350.0           0.0             1
American Express    Confirmed       Queen Room w. Shared Facilities            Yes  110.53          0.0             1
American Express    Confirmed       Single Room w. Shared Facilities           No   4258.48         0.0             3
Mastercard          Cancelled       Queen Room w. Shared Facilities            No   28.5            0.0             3
Mastercard          Cancelled       Single Room w. Shared Facilities           Yes  578.55          0.0             2
Mastercard          Checked Out     Compact Double Room w. Shared Facilities   No   4637.71         0.0             22

...

df = pd.DataFrame.from_dict({
    'card_type': {0: 'American Express', 1: 'American Express', 2: 'American Express', 3: 'American Express', 4: 'American Express', 5: 'American Express', 6: 'American Express', 7: 'Mastercard', 8: 'Mastercard', 9: 'Mastercard'},
    'payment_status': {0: 'Checked Out', 1: 'Checked Out', 2: 'Checked Out', 3: 'Confirmed', 4: 'Confirmed', 5: 'Confirmed', 6: 'Confirmed', 7: 'Cancelled', 8: 'Cancelled', 9: 'Checked Out'},
    'airbnb': {0: 'Premium Queen Ensuite ', 1: 'Queen Room w. Shared Facilities ', 2: 'Single Room w. Shared Facilities ', 3: 'Compact Double Room w. Shared Facilities ', 4: 'Premium Queen Ensuite ', 5: 'Queen Room w. Shared Facilities ', 6: 'Single Room w. Shared Facilities ', 7: 'Queen Room w. Shared Facilities ', 8: 'Single Room w. Shared Facilities ', 9: 'Compact Double Room w. Shared Facilities '},
    'paid': {0: 'No', 1: 'No', 2: 'No', 3: 'No', 4: 'No', 5: 'Yes', 6: 'No', 7: 'No', 8: 'Yes', 9: 'No'},
    'revenue - sum': {0: 591.49, 1: 255.52, 2: 1602.02, 3: 189.05, 4: 350.0, 5: 110.53, 6: 4258.48,7: 28.5, 8: 578.55, 9: 4637.71},
    'revenue - min': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0},
    'debit - sum': {0: 2, 1: 2, 2: 5, 3: 1, 4: 1, 5: 1, 6: 3, 7: 3, 8: 2, 9: 22}})

I have used this method (based on Pandas Pivot table without aggregating ) to achieve (partially) the shape I'm looking. However, I would like to swap the aggfuncs label to the bottom (probably with https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.swaplevel.html ) and it doesn't feel right because my values are already previously calculated and we don't need to be calculated again:

df.pivot_table(index=["card_type", "payment_status"], columns=["airbnb", "paid"], values=["revenue - sum", "revenue - min", "debit - sum"], aggfunc={"revenue - sum": ["sum"], "revenue - min": ["max"], "debit - sum": ["mean"]}, fill_value="-")

What I expect to achieve is a DataFrame similar to this: 在此处输入图像描述

Any way I can get around with this? Thanks!

If you have already computed your values, you can use either:

  • pivot_table with aggfunc='first' and fill_value='_'
  • pivot and fillna('-')

For your column levels, use reorder_levels instead of swaplevel to rearrange colimns levels using input order: levels [0, 1, 2] to [1, 2, 0]:

out = df.pivot(index=["card_type", "payment_status"],
               columns=["airbnb", "paid"],
               values=["revenue - sum", "revenue - min", "debit - sum"]) \
        .fillna('-').reorder_levels([1, 2, 0], axis=1)

Output:

>>> out
airbnb                          Premium Queen Ensuite  Queen Room w. Shared Facilities  Single Room w. Shared Facilities   ... Compact Double Room w. Shared Facilities  Queen Room w. Shared Facilities  Single Room w. Shared Facilities 
paid                                                No                               No                                No  ...                                        No                              Yes                               Yes
                                         revenue - sum                    revenue - sum                     revenue - sum  ...                               debit - sum                      debit - sum                       debit - sum
card_type        payment_status                                                                                            ...                                                                                                             
American Express Checked Out                    591.49                           255.52                           1602.02  ...                                         -                                -                                 -
                 Confirmed                       350.0                                -                           4258.48  ...                                       1.0                              1.0                                 -
Mastercard       Cancelled                           -                             28.5                                 -  ...                                         -                                -                               2.0
                 Checked Out                         -                                -                                 -  ...                                      22.0                                -                                 -

Update

I would like to create one more level which results from the split of values by: "-"

As you have to break some columns names into two parts, use a different strategy. First, move some columns as index of your dataframe then explode your remain columns names into multi level. Finally, unstack your airbnb and paid index levels then rearrange the order of your column levels:

out = df.set_index(['card_type', 'payment_status', 'airbnb', 'paid'])
out.columns = out.columns.str.split(' - ').map(tuple)
out = out.unstack(['airbnb', 'paid'], fill_value='-') \
         .reorder_levels([2, 3, 0, 1], axis=1)

Output:

>>> out
airbnb                          Compact Double Room w. Shared Facilities          Premium Queen Ensuite   ... Queen Room w. Shared Facilities  Single Room w. Shared Facilities       
paid                                                                   No     Yes                     No  ...                              Yes                                No   Yes
                                                                  revenue revenue                revenue  ...                            debit                             debit debit
                                                                      sum     sum                    sum  ...                              sum                               sum   sum
card_type        payment_status                                                                           ...                                                                         
American Express Checked Out                                            -       -                 591.49  ...                                -                                 5     -
                 Confirmed                                         189.05       -                  350.0  ...                                1                                 3     -
Mastercard       Cancelled                                              -       -                      -  ...                                -                                 -     2
                 Checked Out                                      4637.71       -                      -  ...                                -                                 -     -

[4 rows x 24 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM