简体   繁体   English

问题? 从熊猫0.17.1中的DataFrame中选择数据ween_time

[英]Issue? selecting data between_time from DataFrame in pandas 0.17.1

I am having an issue selecting data from a pandas DataFrame with between_time. 我在使用ween_time从熊猫DataFrame中选择数据时遇到问题。 When the start and end dates of the query are between two days the result is empty. 当查询的开始日期和结束日期在两天之间时,结果为空。 I am using pandas 0.17.1 (python 2.7) 我正在使用熊猫0.17.1(python 2.7)

I have the following data frame: 我有以下数据框:

mydf = pd.DataFrame.from_dict({'azi': {Timestamp('2015-05-12 00:00:14.348000'): 109.801,
Timestamp('2015-05-12 00:00:36.125000'): 109.994,
Timestamp('2015-05-12 00:00:57.599000'): 109.60299999999999,
Timestamp('2015-05-12 00:01:14.576000'): 100.2},
'ele': {Timestamp('2015-05-12 00:00:14.348000'): 180.001,
Timestamp('2015-05-12 00:00:36.125000'): 179.999,
Timestamp('2015-05-12 00:00:57.599000'): 179.999,
Timestamp('2015-05-12 00:01:14.576000'): 180.001}})

Which results in: 结果是:

                            azi     ele
2015-05-12 00:00:14.348     109.801     180.001
2015-05-12 00:00:36.125     109.994     179.999
2015-05-12 00:00:57.599     109.603     179.999
2015-05-12 00:01:14.576     100.200     180.001

The following query fails : 以下查询失败

mydf['azi'].between_time(datetime(2015, 5, 11, 23, 59, 59, 850000), datetime(2015, 5, 12, 0, 1, 59, 850000))

resulting in: 导致:

Series([], Name: azi, dtype: float64)

However the following query works : 但是,以下查询有效

mydf2['azi'].between_time(datetime(2015, 5, 11, 0, 0, 0, 0), datetime(2015, 5, 12, 0, 1, 59, 850000))

with the right answer: 正确答案:

 2015-05-12 00:00:14.348    109.801
 2015-05-12 00:00:36.125    109.994
 2015-05-12 00:00:57.599    109.603
 2015-05-12 00:01:14.576    100.200
 Name: azi, dtype: float64

Questions : 问题

  1. I am missing something in the functionality of the function, or is this a real bug? 我在该功能的功能中缺少什么,还是这是一个真正的错误?
  2. Is there a workaround for this? 有没有解决方法? The background is that I really need to process data in 1 minute chunks which limits are not always coinciding with 00:00:00 背景是我真的需要以1分钟为单位处理数据,限制并不总是与00:00:00一致

You could find a lot of information how to work with datetime index from docs . 您可以从docs中找到很多有关如何使用日期时间索引的信息。 For you case you could try loc : 对于您来说,您可以尝试loc

In [147]: mydf['azi'].loc[datetime(2015, 5, 11, 23, 59, 59, 850000): datetime(2015, 5, 12, 0, 1, 59, 850000)]
Out[147]: 
2015-05-12 00:00:14.348    109.801
2015-05-12 00:00:36.125    109.994
2015-05-12 00:00:57.599    109.603
2015-05-12 00:01:14.576    100.200
Name: azi, dtype: float64

It was about your 2) bullet. 这是关于您的2)项目符号。 About 1) you could see explanation from @Jeff 大约1)您可以看到@Jeff的解释

The doc-string says it all. 文档字符串说明了一切。

between_time selects all TIMES. between_time选择所有时间。

In [67]: mydf.between_time?
Signature: mydf.between_time(start_time, end_time, include_start=True, include_end=True)
Docstring:
Select values between particular times of the day (e.g., 9:00-9:30 AM)

Parameters
----------
start_time : datetime.time or string
end_time : datetime.time or string
include_start : boolean, default True
include_end : boolean, default True

Returns
-------
values_between_time : type of caller
File:      ~/pandas/pandas/core/generic.py
Type:      instancemethod

In [68]: mydf
Out[68]: 
                             azi      ele
2015-05-12 00:00:14.348  109.801  180.001
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999
2015-05-12 00:01:14.576  100.200  180.001

In [70]: mydf.between_time('00:00:30','00:01:00')
Out[70]: 
                             azi      ele
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999

You can separately use partial-string indexing, see here to select based on dates (these can be strings or datetimelikes). 您可以单独使用partial-string索引,请参见此处根据日期进行选择(这些可以是字符串或datetimelike)。

In [73]: mydf.loc['20150512 00:00:30':'20150512 00:01:00']
Out[73]: 
                             azi      ele
2015-05-12 00:00:36.125  109.994  179.999
2015-05-12 00:00:57.599  109.603  179.999

I think .between_time should actually raise on non .time / string convertible objects, but IIRC this was done for ease of implementation. 我认为.between_time实际上应该在非.time /字符串可转换对象上.time ,但是IIRC这样做是为了易于实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM