简体   繁体   中英

select pandas dataframe datetime column based on times

i have the following dataframe:

                    timestamp  mes
0     2019-01-01 18:15:55.700  1228
1     2019-01-01 18:35:56.872  1402
2     2019-01-01 18:35:56.872  1560
3     2019-01-01 19:04:25.700  1541
4     2019-01-01 19:54:23.150  8754
5     2019-01-02 18:01:00.025  4124
6     2019-01-02 18:17:56.125  9736
7     2019-01-02 18:58:59.799  1597
8     2019-01-02 20:10:15.896  5285

How can I select only the rows where timestamp is between a start_time and end_time , for all the days in the dataframe? Basically the same role of .between_time() but here the timestamp column can't be the index since there are repeated values. Also, this is actually a chunk from pd.read_csv() and I would have to do this for several millions of them, would it be faster if I used for example numpy datetime functionalities? I guess I could create from timestamp a time column and create a mask on that, but I'm afraid this would be too slow.

EDIT: I added more rows and this is the expected result, say for start_time=datetime.time(18) , end_time=datetime.time(19) :

                    timestamp  mes
0     2019-01-01 18:15:55.700  1228
1     2019-01-01 18:35:56.872  1402
2     2019-01-01 18:35:56.872  1560
5     2019-01-02 18:01:00.025  4124
6     2019-01-02 18:17:56.125  9736
7     2019-01-02 18:58:59.799  1597

My code (works but is slow):

df['time'] = df.timestamp.apply(lambda x: x.time())
mask = (df.time<end) & (df.time>=start)
selected = df.loc[mask]

If you have you columns set to date time:

start = df["timestamp"] >= "2019-01-01 18:15:55.700" end = df["timestamp"] <= "2019-01-01 18:15:55.896 " between_two_dates = start & end df.loc[between_two_dates]

Works for me. Just set timestamp to datetime and take to index

  df=pd.DataFrame({'timestamp':['2019-01-01 18:15:55.700','2019-01-01 18:17:55.700','2019-01-01 18:19:55.896'],'mes':[1228,1402,1560]})#Data
    df['timestamp']=pd.to_datetime(df['timestamp'])#Coerce timestamp to datetime
    df.set_index('timestamp', inplace=True)#set timestamp as index
    df.between_time('18:16', '20:15')#Time btetween select

Result

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM