简体   繁体   English

Pandas MultiIndex 重新排列列

[英]Pandas MultiIndex rearranging columns

The MultiIndex rearranges the columns seemingly randomly when the label values are not aligned, when I use the function get_level_values to get the columns values.当标签值未对齐时,当我使用函数get_level_values获取列值时,MultiIndex 似乎随机地重新排列列。

For instance, I can create a MultiIndex, whose labels are ordered from 0 to 4.例如,我可以创建一个 MultiIndex,其标签从 0 到 4 排序。

import pandas as pd
import numpy as np
work_index=pd.MultiIndex(levels=[['ANA','ANC','PPI','SCAF','SAC'],['Sample']],labels=[[0,1,2,3,4],[0,0,0,0,0]])

When I put this MultiIndex into a dataframe and run the get_levels_values function I get ['ANA','ANC','PPI','SCAF','SAC'] in the proper order I expect to get.当我将此 MultiIndex 放入数据帧并运行get_levels_values函数时,我会按照我期望的正确顺序得到['ANA','ANC','PPI','SCAF','SAC']

work=pd.DataFrame(np.random.randn(5,5),columns=work_index)
work.columns #note the proper order
>>> MultiIndex(levels=[['ANA', 'ANC', 'PPI', 'SCAF', 'SAC'], ['Sample']],
           labels=[[0, 1, 2, 3, 4], [0, 0, 0, 0, 0]])
work.columns.get_level_values(0) #same order as before
>>> Index(['ANA', 'ANC', 'PPI', 'SCAF', 'SAC'], dtype='object')

However, if I create a multindex with the labels not in numerical order, get_level_values returns a weird, seemingly random order.但是,如果我创建一个标签不按数字顺序排列的get_level_valuesget_level_values返回一个奇怪的、看似随机的顺序。 Here instead of [0,1,2,3,4] I choose [2,1,4,3,0] .这里我选择[2,1,4,3,0]而不是[0,1,2,3,4] [2,1,4,3,0]

not_work_index=pd.MultiIndex(levels=[['ANA','ANC','PPI','SCAF','SAC'],['Sample']],labels=[[2, 1, 4, 3, 0],[0,0,0,0,0]])

Putting this into a dataframe does not give me ['ANA','ANC','PPI','SCAF','SAC'] , rather ['PPI','ANC','SAC','SCAF','ANA']将其放入数据框中不会给我['ANA','ANC','PPI','SCAF','SAC'] ,而是['PPI','ANC','SAC','SCAF','ANA']

not_work=pd.DataFrame(np.random.randn(5,5),columns=not_work_index)
not_work.columns
>>> MultiIndex(levels=[['ANA', 'ANC', 'PPI', 'SCAF', 'SAC'], ['Sample']],
       labels=[[2, 1, 4, 3, 0], [0, 0, 0, 0, 0]])
not_work.columns.get_level_values(0)
>>> Index(['PPI', 'ANC', 'SAC', 'SCAF', 'ANA'], dtype='object')

Is there a way for get_level_values to return the levels in order even if the labels are not in order?即使标签不按顺序, get_level_values有没有办法按顺序返回级别? Is there another way to query the upper level to get the columns in correct order?是否有另一种方法可以查询上层以按正确顺序获取列?

I'm not sure if this is a bug or not, it looks like get_level_values always returns a sorted array ignoring the creation order, the IndexArray itself knows the correct order.我不确定这是否是一个错误,看起来get_level_values总是返回一个忽略创建顺序的排序数组, IndexArray本身知道正确的顺序。 You can get the order you want using the following gnarly code to get the label array to mask the level values:您可以使用以下粗糙的代码获取所需的顺序,以获取label数组以屏蔽级别值:

In [11]:
not_work.columns.get_level_values(0)[not_work.columns.labels[0]]

Out[11]:
Index(['SAC', 'ANC', 'ANA', 'SCAF', 'PPI'], dtype='object')

Here I access the labels attribute of the IndexArray or columns:在这里,我访问IndexArray或列的labels属性:

In [12]:
not_work.columns.labels

Out[12]:
FrozenList([[2, 1, 4, 3, 0], [0, 0, 0, 0, 0]])

I then index the first level using [0] :然后我使用[0]索引第一级:

In [13]:
not_work.columns.labels[0]

Out[13]:
FrozenNDArray([2, 1, 4, 3, 0], dtype='int8')

We can then use this to mask the level values to return the original order:然后我们可以使用它来屏蔽级别值以返回原始顺序:

In [11]:
not_work.columns.get_level_values(0)[not_work.columns.labels[0]]

Out[11]:
Index(['SAC', 'ANC', 'ANA', 'SCAF', 'PPI'], dtype='object')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM