使用 timedelta 为 df1 中的每一行保留 df2 中的 Pandas DataFrame 行

Question

I have two pandas dataframes.我有两个熊猫数据框。 I would like to keep all rows in df2 where Type is equal to Type in df1 AND Date is between Date in df1 (- 1 day or + 1 day).我想保持所有行df2其中Type等于Type在df1和Date之间Date在df1 （ - 1天或+ 1天）。 How can I do this?我怎样才能做到这一点？

df1 df1

   IBSN  Type          Date
0     1     X    2014-08-17
1     1     Y    2019-09-22

df2 df2

   IBSN  Type          Date
0     2     X    2014-08-16
1     2     D    2019-09-22
2     9     X    2014-08-18
3     3     H    2019-09-22
4     3     Y    2019-09-23
5     5     G    2019-09-22

res资源

   IBSN  Type          Date
0     2     X    2014-08-16 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] - 1
1     9     X    2014-08-18 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] + 1
2     3     Y    2019-09-23 <-- keep because Type = df1[1]['Type'] AND Date = df1[1]['Date'] + 1

Answer 1

This should do it:这应该这样做：

import pandas as pd
from datetime import timedelta

# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date'])  # might not be necessary if your Date column already contain datetime objects

df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date'])  # might not be necessary if your Date column already contain datetime objects


# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))

# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
    drop(['Date_from', 'Date_to'], axis=1)

Note that according to your logic, row 4 in df2 (3 Y 2014-09-23) should not remain as its date (2014) is not in between the given dates in df1 (year 2019).请注意，根据您的逻辑，df2 (3 Y 2014-09-23) 中的第 4 行不应保留，因为其日期 (2014) 不在 df1 (2019 年) 中的给定日期之间。

Answer 2

Assume Date columns in both dataframes are already in dtype datetime .假设两个数据框中的Date列都已经在 dtype datetime 。 I would construct IntervalIndex to assign to index of df1 .我会构造IntervalIndex来分配给df1索引。 Map columns Type of df1 to df2 .将df1列Type Map到df2 。 Finally check equality to create mask to slice最后检查相等性以创建要切片的掩码

iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1), 
                                   df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]

Out[1131]:
   IBSN Type       Date
0     2    X 2014-08-16
2     9    X 2014-08-18
4     3    Y 2019-09-23

使用 timedelta 为 df1 中的每一行保留 df2 中的 Pandas DataFrame 行

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-01-23 02:22:18

解决方案2
1 2020-01-23 02:57:16

使用 timedelta 为 df1 中的每一行保留 df2 中的 Pandas DataFrame 行

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-01-23 02:22:18

解决方案2 1 2020-01-23 02:57:16

解决方案1
2 已采纳 2020-01-23 02:22:18

解决方案2
1 2020-01-23 02:57:16