简体   繁体   English

按时间戳列过滤/选择熊猫数据帧的行

[英]filter/select rows of pandas dataframe by timestamp column

I am new to pandas.我是熊猫新手。 I have dataframe with two columns dt (date-time stamp) and value.我有两列 dt (日期时间戳)和值的数据框。

Given two start and end data-time stamps: is there a easy way to create a new dataframe from original one that contains rows between the two date-time stamp?给定两个开始和结束数据时间戳:是否有一种简单的方法可以从包含两个日期时间戳之间的行的原始数据框创建一个新数据框?

                dt    value
84    7/23/2014 7:00  0.300
85    7/23/2014 7:05  0.300
86    7/23/2014 7:10  0.312
87    7/23/2014 7:15  0.300
88    7/23/2014 7:20  0.300
89    7/23/2014 7:25  0.300
90    7/23/2014 7:30  0.300
91    7/23/2014 7:35  0.300
92    7/23/2014 7:40  0.300
93    7/23/2014 7:45  0.216
94    7/23/2014 7:50  0.204
95    7/23/2014 7:55  0.228
96    7/23/2014 8:00  0.228
97    7/23/2014 8:05  0.228
98    7/23/2014 8:10  0.228
99    7/23/2014 8:15  0.240
100   7/23/2014 8:20  0.228
101   7/23/2014 8:25  0.216
102   7/23/2014 8:30  0.228
103   7/23/2014 8:35  0.324
104   7/23/2014 8:40  0.336
105   7/23/2014 8:45  0.324
106   7/23/2014 8:50  0.324
107   7/23/2014 8:55  0.324
108   7/23/2014 9:00  0.252
109   7/23/2014 9:05  0.252
110   7/23/2014 9:10  0.240
111   7/23/2014 9:15  0.240
112   7/23/2014 9:20  0.252
113   7/23/2014 9:25  0.240
..               ...    ...
198  7/23/2014 16:30  0.132
199  7/23/2014 16:35  0.120
200  7/23/2014 16:40  0.180
201  7/23/2014 16:45  0.216
202  7/23/2014 16:50  0.204
203  7/23/2014 16:55  0.192

So long as dt is a datetime dtype already you can filter using date strings, if not then you can convert doing this:只要 dt 已经是 datetime dtype,您就可以使用日期字符串进行过滤,如果不是,那么您可以这样做转换:

df['dt'] = pd.to_datetime(df['dt'])

Then filter:然后过滤:

In [115]:

df[(df['dt'] > '2014-07-23 07:30:00') & (df['dt'] < '2014-07-23 09:00:00')]
Out[115]:
                       dt  value
index                           
91    2014-07-23 07:35:00  0.300
92    2014-07-23 07:40:00  0.300
93    2014-07-23 07:45:00  0.216
94    2014-07-23 07:50:00  0.204
95    2014-07-23 07:55:00  0.228
96    2014-07-23 08:00:00  0.228
97    2014-07-23 08:05:00  0.228
98    2014-07-23 08:10:00  0.228
99    2014-07-23 08:15:00  0.240
100   2014-07-23 08:20:00  0.228
101   2014-07-23 08:25:00  0.216
102   2014-07-23 08:30:00  0.228
103   2014-07-23 08:35:00  0.324
104   2014-07-23 08:40:00  0.336
105   2014-07-23 08:45:00  0.324
106   2014-07-23 08:50:00  0.324
107   2014-07-23 08:55:00  0.324

The answer above is right, but for people who just like me stumble upon this question more than 5 years after it was posted I want to add this remark.上面的答案是对的,但是对于像我一样在这个问题发布 5 年多后偶然发现这个问题的人,我想添加这个评论。

If you want to filter on a sorted column (and timestamps tend to be like one) it is more efficient to use the searchsorted function of pandas Series to reach O(log(n)) complexity instead of O(n).如果你想过滤一个排序的列(并且时间戳往往是一个),使用 pandas Series 的searchsorted函数来达到 O(log(n)) 复杂度而不是 O(n) 会更有效。

The example below gives as a result in a difference of much more than a factor 1000. This difference can be made arbitrarily large due to the difference in complexity off course, but the chosen numbers are the ones I was using when I stumbled upon this question.下面的示例给出的结果差异远大于 1000 倍。由于复杂性的差异,这种差异可以任意大,但选择的数字是我偶然发现这个问题时使用的数字.

import pandas as pd
import numpy as np
import timeit

N = 500000
M = 200

data = np.hstack([np.arange(0.,N).reshape(N,1),np.random.randn(N,M-1)])
df = pd.DataFrame(data,columns=["column"+str(i) for i in range(M)])

def return_first(df):
    return df[(df['column0'] > 100.5) & (df['column0'] < 400000.5)]

def return_second(df):
    t1 = df['column0'].searchsorted(100.5)
    t2 = df['column0'].searchsorted(400000.5)
    return df.loc[t1:t2-1]


if __name__ == '__main__':
    t = timeit.timeit(lambda: return_first(df), number=100)
    print(t)
    t = timeit.timeit(lambda: return_second(df), number=100)
    print(t)

results:结果:

59.1751627
0.015401400000001786

You can also use query :您还可以使用query

In [25]: df.query('"2014-07-23 07:55:00" <= dt <= "2014-07-23 08:20:00"')
Out[25]: 
                     dt  value
95  2014-07-23 07:55:00  0.228
96  2014-07-23 08:00:00  0.228
97  2014-07-23 08:05:00  0.228
98  2014-07-23 08:10:00  0.228
99  2014-07-23 08:15:00  0.240
100 2014-07-23 08:20:00  0.228

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM