简体   繁体   English

对 Pandas MultiIndex 进行排序

[英]Sort pandas MultiIndex

I have created a Dataframe with a MultiIndex by using another Dataframe:我使用另一个 Dataframe 创建了一个带有 MultiIndex 的 Dataframe:

arrays = [df['bus_uid'], df['bus_type'], df['type'],
          df['obj_uid'], df['datetime']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['bus_uid', 'bus_type', 'type',
                                                 'obj_uid', 'datetime'])
multindex_df = pd.DataFrame(df['val'].values, index=index)

This worked fine as described in the documentation http://pandas.pydata.org/pandas-docs/stable/advanced.html .这工作正常,如文档http://pandas.pydata.org/pandas-docs/stable/advanced.html 中所述

In the documentation it also says that the labels need to be sorted for the correct working of indexing and slicing functionalities under "The need for sortedness with MultiIndex".在文档中,它还表示需要对标签进行排序,以便在“使用 MultiIndex 进行排序的需要”下索引和切片功能的正确工作。

But somehow但不知何故

multindexed_df.sort_index(level=0)

or或者

multindexed_df.sort_index(level='bus_uid')

does not work anymore and throws TypeError: sort_index() got an unexpected keyword argument 'level' .不再工作并抛出TypeError: sort_index() got an unexpected keyword argument 'level'

Looking up the object information on sort_index() it looks as "by" is my new friend instead of "levels":sort_index()上查找对象信息它看起来像“by”是我的新朋友而不是“levels”:

by:object
  Column name(s) in frame. Accepts a column name or a list for a nested sort. A tuple will be interpreted as the levels of a multi-index.

My question is the following: How can I sort my MultiIndex so that all functionalities (slicing,etc.) are working correctly?我的问题如下:如何对我的 MultiIndex 进行排序,以便所有功能(切片等)都能正常工作?

The answer depends on the pandas version you are working with.答案取决于您正在使用的 Pandas 版本。 With the latest pandas (>= 0.17.0) , you can indeed use the level keyword to specify to sort which level of the multi-index:使用最新的 pandas (>= 0.17.0) ,确实可以使用level关键字来指定对多索引的哪个级别进行排序:

df = df.sort_index(level=0)

But, if you have an older pandas (< 0.17.0) , this level keyword is not yet available, but you can use the sortlevel method:但是,如果您有一个较旧的 pandas (< 0.17.0) ,则此level关键字尚不可用,但您可以使用sortlevel方法:

df = df.sortlevel(level=0)

But note that if you want to sort all levels , you don't need to specify the level keyword, and you can just do:但请注意,如果要对所有 level进行排序,则不需要指定level关键字,您可以这样做:

df = df.sort_index()

This will work for both the recent and older versions of pandas.这适用于最新版本和旧版本的熊猫。


For a summary of these changes in the sorting API, see http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-sorting-api有关排序 API 中这些更改的摘要,请参阅http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#changes-to-sorting-api

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM