简体   繁体   English

Pandas过滤 - 非索引列的between_time

[英]Pandas filtering - between_time on a non-index column

I need to filter out data with specific hours. 我需要按特定小时过滤掉数据。 The DataFrame function between_time seems to be the proper way to do that, however, it only works on the index column of the dataframe; DataFrame函数between_time似乎是正确的方法,但它只适用于数据帧的索引列; but I need to have the data in the original format (eg pivot tables will expect the datetime column to be with the proper name, not as the index). 但是我需要以原始格式存储数据(例如,数据透视表将期望datetime列具有正确的名称,而不是索引)。

This means that each filter looks something like this: 这意味着每个过滤器看起来像这样:

df.set_index(keys='my_datetime_field').between_time('8:00','21:00').reset_index()

Which implies that there are two reindexing operations every time such a filter is run. 这意味着每次运行此类过滤器时都会进行两次重建索引操作。

Is this a good practice or is there a more appropriate way to do the same thing? 这是一个很好的做法还是有更合适的方法来做同样的事情?

Create a DatetimeIndex , but store it in a variable, not the DataFrame. 创建DatetimeIndex ,但将其存储在变量中,而不是DataFrame中。 Then call it's indexer_between_time method. 然后调用它的indexer_between_time方法。 This returns an integer array which can then be used to select rows from df using iloc : 这将返回一个整数数组,然后可以使用ilocdf选择行:

import pandas as pd
import numpy as np

N = 100
df = pd.DataFrame(
    {'date': pd.date_range('2000-1-1', periods=N, freq='H'),
     'value': np.random.random(N)})

index = pd.DatetimeIndex(df['date'])
df.iloc[index.indexer_between_time('8:00','21:00')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM