简体   繁体   English

来自MultiIndex的Pandas Dataframe slice级别过多

[英]Pandas Dataframe slice from MultiIndex has too many levels

I have a dataframe that is a sliced from a larger dataframe: 我有一个数据框,该数据框是从较大的数据框切下来的:

df
Out[47]: 
                           price  log_price  dlog_price
data_source_id trade_date                              
1              2014-03-05  174.4   5.161352   -2.089993

As you can see, the dataframe has 1 row in it. 如您所见,数据框中有1行。

However, the index has thousands of levels, as these still appear to be there from the parent: 但是,索引具有数千个级别,因为这些级别似乎仍来自父级:

df.index
Out[48]: 
MultiIndex(levels=[[1, 2, 4, 5, 6, 7, 8, 9], [1990-01-01 00:00:00, 1990-01-02 00:00:00, 1990-01-03 00:00:00, 1990-01-04 00:00:00, 1990-01-05 00:00:00, 1990-01-08 00:00:00, 1990-01-09 00:00:00, 1990-01-10 00:00:00, 1990-01-11 00:00:00, 1990-01-12 00:00:00, 1990-01-15 00:00:00, 1990-01-16 00:00:00, 1990-01-17 00:00:00, 1990-01-18 00:00:00, 1990-01-19 00:00:00, 1990-01-22 00:00:00, 1990-01-23 00:00:00, 1990-01-24 00:00:00, 1990-01-25 00:00:00, 1990-01-26 00:00:00, 1990-01-29 00:00:00, 1990-01-30 00:00:00, 1990-01-31 00:00:00, 1990-02-01 00:00:00, 1990-02-02 00:00:00, 1990-02-05 00:00:00, 1990-02-06 00:00:00, 1990-02-07 00:00:00, 1990-02-08 00:00:00, 1990-02-09 00:00:00, 1990-02-12 00:00:00, 1990-02-13 00:00:00, 1990-02-14 00:00:00, 1990-02-15 00:00:00, 1990-02-16 00:00:00, 1990-02-19 00:00:00, 1990-02-20 00:00:00, 1990-02-21 00:00:00, 1990-02-22 00:00:00, 1990-02-23 00:00:00, 1990-02-26 00:00:00, 1990-02-27 00:00:00, 1990-02-28 00:00:00, 1990-03-01 00:00:00, 1990-03-02 00:00:00, 1990-03-05 00:00:00, 1990-03-06 00:00:00, 1990-03-07 00:00:00, 1990-03-08 00:00:00, 1990-03-09 00:00:00, 1990-03-12 00:00:00, 1990-03-13 00:00:00, 1990-03-14 00:00:00, 1990-03-15 00:00:00, 1990-03-16 00:00:00, 1990-03-19 00:00:00, 1990-03-20 00:00:00, 1990-03-21 00:00:00, 1990-03-22 00:00:00, 1990-03-23 00:00:00, 1990-03-26 00:00:00, 1990-03-27 00:00:00, 1990-03-28 00:00:00, 1990-03-29 00:00:00, 1990-03-30 00:00:00, 1990-04-02 00:00:00, 1990-04-03 00:00:00, 1990-04-04 00:00:00, 1990-04-05 00:00:00, 1990-04-06 00:00:00, 1990-04-09 00:00:00, 1990-04-10 00:00:00, 1990-04-11 00:00:00, 1990-04-12 00:00:00, 1990-04-13 00:00:00, 1990-04-16 00:00:00, 1990-04-17 00:00:00, 1990-04-18 00:00:00, 1990-04-19 00:00:00, 1990-04-20 00:00:00, 1990-04-23 00:00:00, 1990-04-24 00:00:00, 1990-04-25 00:00:00, 1990-04-26 00:00:00, 1990-04-27 00:00:00, 1990-04-30 00:00:00, 1990-05-01 00:00:00, 1990-05-02 00:00:00, 1990-05-03 00:00:00, 1990-05-04 00:00:00, 1990-05-07 00:00:00, 1990-05-08 00:00:00, 1990-05-09 00:00:00, 1990-05-10 00:00:00, 1990-05-11 00:00:00, 1990-05-14 00:00:00, 1990-05-15 00:00:00, 1990-05-16 00:00:00, 1990-05-17 00:00:00, 1990-05-18 00:00:00, ...]],
           labels=[[0], [6308]],
           names=['data_source_id', 'trade_date'])

How can I clean up the multi index, so that it doesn't have so many levels? 我如何清理多重索引,使其没有那么多层次?

This appears to work, but it is a bit messy: 这看起来可行,但是有点混乱:

df2 = df.reset_index().set_index( df.index.names )

df2.index
Out[53]: 
MultiIndex(levels=[[1], [2014-03-05 00:00:00]],
           labels=[[0], [0]],
           names=['data_source_id', 'trade_date'])

You can do: 你可以做:

df.index = pd.MultiIndex.from_tuples(df.index.values, names=df.index.names)

alternatively: 或者:

>>> arr = list(map(df.index.get_level_values, range(df.index.nlevels)))
>>> df.index = pd.MultiIndex.from_arrays(arr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM