Python - 按時間間隔合並數據（R data.table 模擬？）

Question

我只是在學習 python 並且有一個關於按時間集成數據幀的問題。 例如，假設我有 2 個獨立的數據幀，它們的時間間隔不規則，但按 study_id 分組。 我想加入相距 2 小時之內的行。

以前，我為此使用了 R 中的 data.table 包。 此代碼的示例如下。

df_new <- df1[df2, on="Study_ID", allow.cartesian=T][difftime(`date_df1`, `date_df2`, units="hours") <= 2 & difftime(`date_df1`, `date_df2`, units="hours") >= - 2]

此代碼然后綁定每個實例，其中每個數據幀的日期都在 2 小時之內。 我想看看是否有類似的python代碼？ 理想情況下，我想合並這些行，以便我可以找到在測量之前或之后 2 小時內發生的測量之間發生的最大值。

有什么想法嗎？ 謝謝！

編輯：數據框示例

    ID   Date           HeartRate
    1    4/1/2019 04:13     56
    1    4/2/2019 05:30     45
    1    4/3/2019 22:10     61
    2    4/3/2019 23:13     62
    2    4/5/2019 15:10     67

    df2
    ID   Date             Weight
     1    4/1/2019 06:10     112
     1    4/2/2019 02:30     114
     1    4/3/2019 21:10     112.5
     2    4/3/2019 23:10     113
     2    4/4/2019 00:00     114

    Output (this is what I would love!)
    ID   Date(blood pressure)  HeartRate   Date(weight)   Weight
    1    4/1/2019 4:13            56       4/1/2019 06:10   112
    1    4/3/2019 22:10           61       4/3/2019 21:10   112.5
    2    4/3/2019 23:13           62       4/3/2019 23:10   113
    2    4/3/2019 23:13           62       4/4/2019 00:00   114

在此示例中，每個日期框架中的第二行剛剛被刪除，因為這些測量值在 2 小時內沒有一對。 但是 df1 中顯示的倒數第二行重復，因為它在 df2 中有 2 個案例在 2 小時內。

Answer 1

首先，您需要將日期另存為datetime，然后可以執行與data.table類似的data.table ，在兩個數據data.table之間執行data.table ，然后過濾時差小於2個小時的記錄。

# store as datetime
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

# join dataframes
merged = df1.merge(df2, left_on='ID', right_on='ID', 
                   suffixes=('(blood pressure)', '(weight)'))     
# calculate hour difference between the two dates
hour_dif = np.abs(merged['Date(blood pressure)'] - merged['Date(weight)'])/np.timedelta64(1, 'h')
merged[hour_dif < 2]

哪個產量

#    ID Date(blood pressure)  HeartRate        Date(weight)  Weight
# 0   1  2019-04-01 04:13:00         56 2019-04-01 06:10:00   112.0
# 8   1  2019-04-03 22:10:00         61 2019-04-03 21:10:00   112.5
# 9   2  2019-04-03 23:13:00         62 2019-04-03 23:10:00   113.0

Answer 2

我要感謝@josemz的出色回答！ 它成功了，我的一長串問題是由於我的數據清理錯誤而出現的問題。 非常感謝您的幫助！

Python - 按時間間隔合並數據（R data.table 模擬？）

問題描述

1 個解決方案

解決方案1
0 2019-03-06 19:08:53

解決方案2
0 2019-03-12 01:06:57

Python - 按時間間隔合並數據（R data.table 模擬？）

問題描述

1 個解決方案

解決方案1 0 2019-03-06 19:08:53

解決方案2 0 2019-03-12 01:06:57

解決方案1
0 2019-03-06 19:08:53

解決方案2
0 2019-03-12 01:06:57