Python Pandas: drop columns from multi-indexed dataframe. Column removed, but column name remains

Question

I am trying to create a view of a multi-indexed dataframe. I am wondering why the column name remains even after the column is removed.

import panda as pd

df = pd.DataFrame({'id': [1, 2, 3, 4, 5, 6, 7, 8],
    'x': [2, 2, 2, 2, 12, 12, 12, 12],
    'y': [5.91, 4.43, 5.22, 1.31, 6.32, 6.78, 4.65, 1.98],
    'z': [18.61, 17.60, 18.27, 16.18, 16.81, 16.37, 67.07, 46.00]})

pivot_df = df.pivot_table(index=['id'],columns=['x'],values=['y','z'])

[output]
>>> pivot_df
       y            z       
x     2     12     2      12
id                          
1   5.91   NaN  18.61    NaN
2   4.43   NaN  17.60    NaN
3   5.22   NaN  18.27    NaN
4   1.31   NaN  16.18    NaN
5    NaN  6.32    NaN  16.81
6    NaN  6.78    NaN  16.37
7    NaN  4.65    NaN  67.07
8    NaN  1.98    NaN  46.00

>>> pivot_df.columns
MultiIndex(levels=[['y', 'z'], [2, 12]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[None, 'x'])

In the above code, I can see ['y', 'z'] at level 0 which is expected. Now I try to get rid of columns under 'z'.

new_pivot_df = pivot_df.drop('z',axis=1,level=0)

[output]
>>> new_pivot_df
       y      
x     2     12
id            
1   5.91   NaN
2   4.43   NaN
3   5.22   NaN
4   1.31   NaN
5    NaN  6.32
6    NaN  6.78
7    NaN  4.65
8    NaN  1.98

>>> new_pivot_df.columns
MultiIndex(levels=[['y', 'z'], [2, 12]],
           labels=[[0, 0], [0, 1]],
           names=[None, 'x'])

In the above code, new_pivot_df shows that 'z' was dropped. However, when I check new_pivot_df.columns I still see 'z' in the column names. I would like to understand why that is the case, and I am looking for an elegant suggestion to remove a column (data AND name) from a multi-indexed dataframe.

Thank you in advance.

Answer 1

New in version 0.20.1 remove_unused_levels() :

new_pivot_df.columns = new_pivot_df.columns.remove_unused_levels()
new_pivot_df.columns

Output:

MultiIndex(levels=[['y'], [2, 12]],
           labels=[[0, 0], [0, 1]],
           names=[None, 'x'])

Python Pandas: drop columns from multi-indexed dataframe. Column removed, but column name remains

Question

1 answers

solution1
5 ACCPTED 2017-05-22 19:30:58

Python Pandas: drop columns from multi-indexed dataframe. Column removed, but column name remains

Question

1 answers

solution1 5 ACCPTED 2017-05-22 19:30:58

solution1
5 ACCPTED 2017-05-22 19:30:58