[英]Keep pandas DataFrame rows in df2 for each row in df1 with timedelta
I have two pandas dataframes.我有两个熊猫数据框。 I would like to keep all rows in
df2
where Type
is equal to Type
in df1
AND Date
is between Date
in df1
(- 1 day or + 1 day).我想保持所有行
df2
其中Type
等于Type
在df1
和Date
之间Date
在df1
( - 1天或+ 1天)。 How can I do this?我怎样才能做到这一点?
df1 df1
IBSN Type Date
0 1 X 2014-08-17
1 1 Y 2019-09-22
df2 df2
IBSN Type Date
0 2 X 2014-08-16
1 2 D 2019-09-22
2 9 X 2014-08-18
3 3 H 2019-09-22
4 3 Y 2019-09-23
5 5 G 2019-09-22
res资源
IBSN Type Date
0 2 X 2014-08-16 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] - 1
1 9 X 2014-08-18 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] + 1
2 3 Y 2019-09-23 <-- keep because Type = df1[1]['Type'] AND Date = df1[1]['Date'] + 1
This should do it:这应该这样做:
import pandas as pd
from datetime import timedelta
# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date']) # might not be necessary if your Date column already contain datetime objects
df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date']) # might not be necessary if your Date column already contain datetime objects
# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))
# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
drop(['Date_from', 'Date_to'], axis=1)
Note that according to your logic, row 4 in df2 (3 Y 2014-09-23) should not remain as its date (2014) is not in between the given dates in df1 (year 2019).请注意,根据您的逻辑,df2 (3 Y 2014-09-23) 中的第 4 行不应保留,因为其日期 (2014) 不在 df1 (2019 年) 中的给定日期之间。
Assume Date
columns in both dataframes are already in dtype datetime
.假设两个数据框中的
Date
列都已经在 dtype datetime
。 I would construct IntervalIndex
to assign to index of df1
.我会构造
IntervalIndex
来分配给df1
索引。 Map
columns Type
of df1
to df2
.将
df1
列Type
Map
到df2
。 Finally check equality to create mask to slice最后检查相等性以创建要切片的掩码
iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1),
df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]
Out[1131]:
IBSN Type Date
0 2 X 2014-08-16
2 9 X 2014-08-18
4 3 Y 2019-09-23
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.