简体   繁体   中英

Pandas DataFrame MultiIndex Pivot - Remove Empty Headers and Axis Rows

this is closely related to the question I asked earlier here Python Pandas Dataframe Pivot Table Column and Values Order . Thanks again for the help. Very much appreciated.

I'm trying to automate a report that will be distributed via email to a large audience so it needs to look "pretty" :)

I'm having trouble resetting/removing the Indexes and/or Axis post-Pivots to enable me to use the .style CSS functions (ie creating a Styler Object out of the df) to make the table look nice.

I have a DataFrame where two of the principal fields (in my example here they are "Name" and "Bucket") will be variable. The desired display order will also change (so it can't be hard-coded) but it can be derived earlier in the application (eg "Name_Rank" and "Bucket_Rank") into Integer "Sorting Values" which can be easily sorted (and theoretically dropped later).

I can drop the column Sorting Value but not the Row/Header/Axis(?). Additionally, no matter what I try I just can't seem to get rid of the blank row between the headers and the DataTable.

I (think) I need to set the Index = Bucket and Headers = "Name" and "TDY/Change" to use the .style style object functionality properly.

    import pandas as pd
    import numpy as np

    data = [
    ['AAA',2,'X',3,5,1],
    ['AAA',2,'Y',1,10,2],
    ['AAA',2,'Z',2,15,3],
    ['BBB',3,'X',3,15,3],
    ['BBB',3,'Y',1,10,2],
    ['BBB',3,'Z',2,5,1],
    ['CCC',1,'X',3,10,2],
    ['CCC',1,'Y',1,15,3],
    ['CCC',1,'Z',2,5,1],
    ]

    df = pd.DataFrame(data, columns = 
    ['Name','Name_Rank','Bucket','Bucket_Rank','Price','Change'])

    display(df)
Name Name_Rank Bucket Bucket_Rank Price Change
0 AAA 2 X 3 5 1
1 AAA 2 Y 1 10 2
2 AAA 2 Z 2 15 3
3 BBB 3 X 3 15 3
4 BBB 3 Y 1 10 2
5 BBB 3 Z 2 5 1
6 CCC 1 X 3 10 2
7 CCC 1 Y 1 15 3
8 CCC 1 Z 2 5 1

Based on the prior question/answer I can pretty much get the table into the right format:

    df2 = (pd.pivot_table(df, values=['Price','Change'],index=['Bucket_Rank','Bucket'], 
    columns=['Name_Rank','Name'], aggfunc=np.mean)
        .swaplevel(1,0,axis=1)
        .sort_index(level=0,axis=1)
        .reindex(['Price','Change'],level=1,axis=1)
        .swaplevel(2,1,axis=1)
        .rename_axis(columns=[None,None,None])
        ).reset_index().drop('Bucket_Rank',axis=1).set_index('Bucket').rename_axis(columns= 
         [None,None,None])

which looks like this:

1 2 3
CCC AAA BBB
Price Change Price Change Price Change
Bucket
Y 15 3 10 2 10 2
Z 5 1 15 3 5 1
X 10 2 5 1 15 3

Ok, so...

A) How do I get rid of the Row/Header/Axis(?) that used to be "Name_Rank" (eg the integer "Sorting Values" 1,2,3). I figured a hack where the df is exported to XLS/re-imported with Header=(1,2) but that can't be the best way to accomplish the objective.

B) How do I get rid of the blank row above the data in the table? From what I've read online it seems like you should "rename_axis=[None]" but this doesn't seem to work no matter which order I try.

C) Is there a way to set the Header(s) such that the both what used to be "Name" and "Price/Change" rows are Headers so that the .style functionality can be employed to format them separate from the data in the table below?

Thanks a lot for whatever suggestions anyone might have. I'm totally stuck!

Cheers, Devon

In pandas 1.4.0 the options for A and B are directly available using the Styler.hide method:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM