简体   繁体   English

在最接近特定时间的数据框中查找值

[英]Find value in dataframe closest to a specific time ago

I have a dataframe with a date-time column and a value column, I'd like to find a way to create another column with the value at the time closest to a given interval before the date-time. 我有一个带有日期时间列和值列的数据框,我想找到一种方法来创建另一个列,该列的值最接近日期时间之前的给定间隔。

What I'd like is to have a column called "Value 2 hours ago", and have the value of this column correspond to the "Value" column at the time that was the closest to 2 hours ago. 我想要的是拥有一个名为“ 2小时前的值”的列,并且使该列的值与最接近2小时前的“值”列相对应。

For example, if the "Date-Time" column shows "01/01/2014 12:10:00", this new column will return the number in "Value" in the line for which "Date-Time" is closest to "01/01/2014 10:10:00" 例如,如果“日期时间”列显示为“ 01/01/2014 12:10:00”,则此新列将返回“日期时间”最接近“ 2014年1月1日10:10:00“

Even better if I can apply some conditions on this value based on how far the real time interval is from the desired "2 hours" interval. 如果我可以根据实时时间间隔与所需的“ 2小时”间隔有多远,对此值应用一些条件,那就更好了。 For example, "return the value closest to 2 hours ago, except if it's less than 1 hour ago or more than 3 hours ago, then return nothing" 例如,“返回最接近2小时前的值,除非它小于1小时前或大于3小时前,则不返回任何值”

To illustrate, here is a sample input dataframe. 为了说明,这是一个示例输入数据框。 I can easily get the value 2 hours ago, and then self-merge it on the two date-time columns. 我可以在2小时前轻松获得该值,然后将其自合并到两个日期时间列中。 The challenge is to have this merge be on the nearest match, rather than an exact match. 挑战在于使此合并位于最接近的匹配项上,而不是精确匹配项上。

df = pd.DataFrame({'Date-Time' : pd.Series(["01/01/2014 04:11:00", "01/01/2014 08:10:00","01/01/2014 09:11:00","01/01/2014 12:10:00"], index=['1', '2','3', '4']),'Value' : pd.Series([9,12,3,21], index=['1', '2','3','4'])})
df["Time"]=pd.to_datetime(df["Time"])
df["t_2h_ago"]=df["Time"]-pd.to_timedelta('2h')
merged=pd.merge(df,df,how='left',left_on='Time',right_on='t_2h_ago')

Take the cartesian product. 拿笛卡尔积。 Then find the difference between the timestamps. 然后找到时间戳之间的差异。 Note I assumed that each date-time is unique in the function named nearest_time. 注意我假设每个日期时间在名为Nearest_time的函数中都是唯一的。 Then group by and calculate the min of each group. 然后分组并计算每个组的最小值。 For each group, this gives you the closest timestamp in seconds. 对于每个组,这将为您提供最接近的时间戳(以秒为单位)。 Then join back. 然后加入。

from datetime import datetime
import time
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date-Time' : pd.Series(["01/01/2014 04:11:00", "01/01/2014 08:10:00","01/01/2014 09:11:00","01/01/2014 12:10:00"], index=['1', '2','3', '4']),'Value' : pd.Series([9,12,3,21], index=['1', '2','3','4'])})

def nearest_time(x):
    row_i= datetime.strptime(x['Date-Time_x'], "%m/%d/%Y %H:%M:%S")
    row_j = datetime.strptime(x['Date-Time_y'], "%m/%d/%Y %H:%M:%S")
    diff = time.mktime(row_i.timetuple()) - time.mktime(row_j.timetuple()) #seconds ex(2 hrs)
    if diff == 0: diff = float('inf')
    return abs(diff)

df = df.copy()
df['key']=1
df = pd.merge(df,df,on='key')
df['diff'] = df.apply(nearest_time,axis=1)
df2 = df.copy()
df2= df2.groupby(['Date-Time_x']).agg({'diff': np.min})
df2 = df2[['diff']]
df2['Date-Time_x']=df2.index

df3 = pd.merge(df2,df, on=['diff',"Date-Time_x"])
print df3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM