[英]Selecting the 2nd MultiIndex Level of Pandas DataFrame as an Indexer
I have a pandas DataFrame with multiindex where I want to select all rows between 11am and 1pm.我有一个带有多索引的 Pandas DataFrame,我想在其中选择上午 11 点到下午 1 点之间的所有行。
import pandas as pd
data = [
('Jack', '2020-01-01 10:00:00', 12),
('Jack', '2020-01-01 11:00:00', 13),
('Jack', '2020-01-01 12:00:00', 14),
('Jack', '2020-01-01 13:00:00', 15),
('Jack', '2020-01-01 14:00:00', 16),
('Ryan', '2020-01-01 10:00:00', 34),
('Ryan', '2020-01-01 11:00:00', 35),
('Ryan', '2020-01-01 12:00:00', 36),
('Ryan', '2020-01-01 13:00:00', 37),
('Ryan', '2020-01-01 14:00:00', 38),
]
df = pd.DataFrame(data, columns=['name', 'datetime', 'score']).set_index(['name','datetime'])
# score
# name datetime
# Jack 2020-01-01 10:00:00 12
# 2020-01-01 11:00:00 13
# 2020-01-01 12:00:00 14
# 2020-01-01 13:00:00 15
# 2020-01-01 14:00:00 16
# Ryan 2020-01-01 10:00:00 34
# 2020-01-01 11:00:00 35
# 2020-01-01 12:00:00 36
# 2020-01-01 13:00:00 37
# 2020-01-01 14:00:00 38
My current solution requires converting all the multiindex to regular columns, converting the datetime
column to an indexer which is then used to select the desired rows.我当前的解决方案需要将所有多索引转换为常规列,将
datetime
列转换为索引器,然后用于选择所需的行。 The multiindex is then rebuilt.然后重建多索引。
df = df.reset_index()
indexer = pd.DatetimeIndex(df['datetime'])
df = df.loc[indexer.indexer_between_time('11:00', '13:00')].set_index(['name', 'datetime'])
# score
# name datetime
# Jack 2020-01-01 11:00:00 13
# 2020-01-01 12:00:00 14
# 2020-01-01 13:00:00 15
# Ryan 2020-01-01 11:00:00 35
# 2020-01-01 12:00:00 36
# 2020-01-01 13:00:00 37
Question: Is it possible to directly use the 2nd level of the multiindex as the indexer, thus avoiding having to reset_index
and set_index
?问题:是否可以直接使用
reset_index
的第二级作为索引器,从而避免必须reset_index
和set_index
?
Or is there an even better method to achieve the filtering of rows between 2 different times?或者是否有更好的方法来实现在 2 个不同时间之间过滤行?
I am using Python 3.7.4 and pandas 0.25.1.我正在使用 Python 3.7.4 和 Pandas 0.25.1。 Willing to upgrade to newer versions if they allow better solutions
如果有更好的解决方案,愿意升级到新版本
You can use the index directly with get_level_values
and pd.IndexSlice
:您可以直接将索引与
get_level_values
和pd.IndexSlice
:
indexer = (pd.DatetimeIndex(df.index.get_level_values('datetime'))
.indexer_between_time('11:00', '13:00'))
df.loc[pd.IndexSlice[:, df.index.get_level_values('datetime')[indexer]], :]
score
name datetime
Jack 2020-01-01 11:00:00 13
2020-01-01 12:00:00 14
2020-01-01 13:00:00 15
Ryan 2020-01-01 11:00:00 35
2020-01-01 12:00:00 36
2020-01-01 13:00:00 37
df.loc[(slice(None),slice('2020-01-01 11:00:00','2020-01-01 13:00:00')),:]
output:输出:
score
name datetime
Jack 2020-01-01 11:00:00 13
2020-01-01 12:00:00 14
2020-01-01 13:00:00 15
Ryan 2020-01-01 11:00:00 35
2020-01-01 12:00:00 36
2020-01-01 13:00:00 37
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.