简体   繁体   English

选择 Pandas DataFrame 的第二个 MultiIndex Level 作为索引器

[英]Selecting the 2nd MultiIndex Level of Pandas DataFrame as an Indexer

I have a pandas DataFrame with multiindex where I want to select all rows between 11am and 1pm.我有一个带有多索引的 Pandas DataFrame,我想在其中选择上午 11 点到下午 1 点之间的所有行。

import pandas as pd

data = [
    ('Jack', '2020-01-01 10:00:00', 12),
    ('Jack', '2020-01-01 11:00:00', 13),
    ('Jack', '2020-01-01 12:00:00', 14),
    ('Jack', '2020-01-01 13:00:00', 15),
    ('Jack', '2020-01-01 14:00:00', 16),
    ('Ryan', '2020-01-01 10:00:00', 34),
    ('Ryan', '2020-01-01 11:00:00', 35),
    ('Ryan', '2020-01-01 12:00:00', 36),
    ('Ryan', '2020-01-01 13:00:00', 37),
    ('Ryan', '2020-01-01 14:00:00', 38),
]
df = pd.DataFrame(data, columns=['name', 'datetime', 'score']).set_index(['name','datetime'])
#                           score
# name datetime                  
# Jack 2020-01-01 10:00:00     12
#      2020-01-01 11:00:00     13
#      2020-01-01 12:00:00     14
#      2020-01-01 13:00:00     15
#      2020-01-01 14:00:00     16
# Ryan 2020-01-01 10:00:00     34
#      2020-01-01 11:00:00     35
#      2020-01-01 12:00:00     36
#      2020-01-01 13:00:00     37
#      2020-01-01 14:00:00     38

My current solution requires converting all the multiindex to regular columns, converting the datetime column to an indexer which is then used to select the desired rows.我当前的解决方案需要将所有多索引转换为常规列,将datetime列转换为索引器,然后用于选择所需的行。 The multiindex is then rebuilt.然后重建多索引。

df = df.reset_index()
indexer = pd.DatetimeIndex(df['datetime'])
df = df.loc[indexer.indexer_between_time('11:00', '13:00')].set_index(['name', 'datetime'])
#                           score
# name datetime                  
# Jack 2020-01-01 11:00:00     13
#      2020-01-01 12:00:00     14
#      2020-01-01 13:00:00     15
# Ryan 2020-01-01 11:00:00     35
#      2020-01-01 12:00:00     36
#      2020-01-01 13:00:00     37

Question: Is it possible to directly use the 2nd level of the multiindex as the indexer, thus avoiding having to reset_index and set_index ?问题:是否可以直接使用reset_index的第二级作为索引器,从而避免必须reset_indexset_index

Or is there an even better method to achieve the filtering of rows between 2 different times?或者是否有更好的方法来实现在 2 个不同时间之间过滤行?

I am using Python 3.7.4 and pandas 0.25.1.我正在使用 Python 3.7.4 和 Pandas 0.25.1。 Willing to upgrade to newer versions if they allow better solutions如果有更好的解决方案,愿意升级到新版本

You can use the index directly with get_level_values and pd.IndexSlice :您可以直接将索引与get_level_valuespd.IndexSlice

indexer = (pd.DatetimeIndex(df.index.get_level_values('datetime'))
           .indexer_between_time('11:00', '13:00'))
df.loc[pd.IndexSlice[:, df.index.get_level_values('datetime')[indexer]], :]     

                          score
name datetime                  
Jack 2020-01-01 11:00:00     13
     2020-01-01 12:00:00     14
     2020-01-01 13:00:00     15
Ryan 2020-01-01 11:00:00     35
     2020-01-01 12:00:00     36
     2020-01-01 13:00:00     37
df.loc[(slice(None),slice('2020-01-01 11:00:00','2020-01-01 13:00:00')),:]

output:输出:

                          score
name datetime                  
Jack 2020-01-01 11:00:00     13
     2020-01-01 12:00:00     14
     2020-01-01 13:00:00     15
Ryan 2020-01-01 11:00:00     35
     2020-01-01 12:00:00     36
     2020-01-01 13:00:00     37

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM