简体   繁体   English

Python:如何在pandas 0.9.0上开发一个between_time类似的方法?

[英]Python: How to develop a between_time similar method when on pandas 0.9.0?

I am stick to pandas 0.9.0 as I'm working under python 2.5, hence I have no between_time method available. 当我在python 2.5下工作时,我坚持使用pandas 0.9.0,因此我没有可用的between_time方法。

I have a DataFrame of dates and would like to filter all the dates that are between certain hours, eg between 08:00 and 09:00 for all the dates within the DataFrame df . 我有一个日期的DataFrame,并希望过滤DataFrame df所有日期的特定时间之间的所有日期,例如08:0009:00之间。

import pandas as pd
import numpy as np
import datetime

dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="10min")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])

How can I develop a method that provides same functionality as between_time method? 如何开发一种提供与between_time方法相同功能的方法?

NB: The original problem I am trying to accomplish is under Python: Filter DataFrame in Pandas by hour, day and month grouped by year 注意:我想要完成的原始问题是在Python:在Pandas中过滤DataFrame按小时,日期和月份按年份分组

UPDATE: 更新:

try to use: 尝试使用:

df.loc[df.index.indexer_between_time('08:00','09:50')]

OLD answer: 老答案:

I'm not sure that it'll work on Pandas 0.9.0, but it's worth to try it: 我不确定它是否适用于Pandas 0.9.0,但值得尝试一下:

df[(df.index.hour >= 8) & (df.index.hour <= 9)]

PS please be aware - it's not the same as between_time as it checks only hours and between_time is able to check time like df.between_time('08:01:15','09:13:28') PS请注意-这是不一样的between_time ,因为它只会检查小时between_time能够检查时间df.between_time('08:01:15','09:13:28')

Hint : download a source code for a newer version of Pandas and take a look at the definition of indexer_between_time() function in pandas/tseries/index.py - you can clone it for your needs 提示 :下载更新版Pandas的源代码,并查看pandas/tseries/index.pyindexer_between_time()函数的定义 - 您可以根据需要克隆它


UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers . 更新:从Pandas 0.20.1开始, .ix索引器已弃用,支持更严格的.iloc和.loc索引器

Here is a NumPy-based way of doing it: 以下是基于NumPy的方法:

import pandas as pd
import numpy as np
import datetime

dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="10min")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])

epoch = np.datetime64('1970-01-01')
start = np.datetime64('1970-01-01 08:00:00')
end = np.datetime64('1970-01-01 09:00:00')

# convert the dates to a NumPy datetime64 array
date_array = df.index.asi8.astype('<M8[ns]') 

# replace the year/month/day with 1970-01-01
truncated = (date_array - date_array.astype('M8[D]')) + epoch

# compare the hour/minute/seconds etc with `start` and `end`
mask = (start <= truncated) & (truncated <=end)

print(df[mask])

yields 产量

                           Power
2009-08-01 08:00:00  1007.289466
2009-08-01 08:10:00   770.732422
2009-08-01 08:20:00   617.388909
2009-08-01 08:30:00  1348.384210
...
2012-07-31 08:30:00   999.133350
2012-07-31 08:40:00  1451.500408
2012-07-31 08:50:00  1161.003167
2012-07-31 09:00:00   670.545371

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM