简体   繁体   English

使用 timedelta 为 df1 中的每一行保留 df2 中的 Pandas DataFrame 行

[英]Keep pandas DataFrame rows in df2 for each row in df1 with timedelta

I have two pandas dataframes.我有两个熊猫数据框。 I would like to keep all rows in df2 where Type is equal to Type in df1 AND Date is between Date in df1 (- 1 day or + 1 day).我想保持所有行df2其中Type等于Typedf1Date之间Datedf1 ( - 1天或+ 1天)。 How can I do this?我怎样才能做到这一点?

df1 df1

   IBSN  Type          Date
0     1     X    2014-08-17
1     1     Y    2019-09-22

df2 df2

   IBSN  Type          Date
0     2     X    2014-08-16
1     2     D    2019-09-22
2     9     X    2014-08-18
3     3     H    2019-09-22
4     3     Y    2019-09-23
5     5     G    2019-09-22

res资源

   IBSN  Type          Date
0     2     X    2014-08-16 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] - 1
1     9     X    2014-08-18 <-- keep because Type = df1[0]['Type'] AND Date = df1[0]['Date'] + 1
2     3     Y    2019-09-23 <-- keep because Type = df1[1]['Type'] AND Date = df1[1]['Date'] + 1

This should do it:这应该这样做:

import pandas as pd
from datetime import timedelta

# create dummy data
df1 = pd.DataFrame([[1, 'X', '2014-08-17'], [1, 'Y', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df1['Date'] = pd.to_datetime(df1['Date'])  # might not be necessary if your Date column already contain datetime objects

df2 = pd.DataFrame([[2, 'X', '2014-08-16'], [2, 'D', '2019-09-22'], [9, 'X', '2014-08-18'], [3, 'H', '2019-09-22'], [3, 'Y', '2014-09-23'], [5, 'G', '2019-09-22']], columns=['IBSN', 'Type', 'Date'])
df2['Date'] = pd.to_datetime(df2['Date'])  # might not be necessary if your Date column already contain datetime objects


# add date boundaries to the first dataframe
df1['Date_from'] = df1['Date'].apply(lambda x: x - timedelta(days=1))
df1['Date_to'] = df1['Date'].apply(lambda x: x + timedelta(days=1))

# merge the date boundaries to df2 on 'Type'. Filter rows where date is between
# data_from and date_to (inclusive). Drop 'date_from' and 'date_to' columns
df2 = df2.merge(df1.loc[:, ['Type', 'Date_from', 'Date_to']], on='Type', how='left')
df2[(df2['Date'] >= df2['Date_from']) & (df2['Date'] <= df2['Date_to'])].\
    drop(['Date_from', 'Date_to'], axis=1)

Note that according to your logic, row 4 in df2 (3 Y 2014-09-23) should not remain as its date (2014) is not in between the given dates in df1 (year 2019).请注意,根据您的逻辑,df2 (3 Y 2014-09-23) 中的第 4 行不应保留,因为其日期 (2014) 不在 df1 (2019 年) 中的给定日期之间。

Assume Date columns in both dataframes are already in dtype datetime .假设两个数据框中的Date列都已经在 dtype datetime I would construct IntervalIndex to assign to index of df1 .我会构造IntervalIndex来分配给df1索引。 Map columns Type of df1 to df2 .df1Type Mapdf2 Finally check equality to create mask to slice最后检查相等性以创建要切片的掩码

iix = pd.IntervalIndex.from_arrays(df1.Date + pd.Timedelta(days=-1), 
                                   df1.Date + pd.Timedelta(days=1), closed='both')
df1 = df1.set_index(iix)
s = df2['Date'].map(df1.Type)
df_final = df2[df2.Type == s]

Out[1131]:
   IBSN Type       Date
0     2    X 2014-08-16
2     9    X 2014-08-18
4     3    Y 2019-09-23

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在熊猫数据框中的两行df2中一次插入一行df1 - How to insert a row of df1 one time in two rows of df2 in pandas dataframe python中的Pandas数据框:根据df2中的行从df1中删除行 - Pandas dataframe in python: Removing rows from df1 based on rows in df2 根据 df2 中的索引保留 df1 中的行 - Keep rows in df1 based on indices in in df2 如何在熊猫中进行“(df1&not df2)”数据框合并? - How to do "(df1 & not df2)" dataframe merge in pandas? 熊猫:如何正确执行df2中的行= df1中的列? - Pandas: how to do row in df2 = column in df1 properly? 如何合并 df1 &amp; df2 但只保留 df2 的新行 - How to merge df1 & df2 but only keep the new rows of df2 DataFrame,如果特定列的值在DF1中,则将DF1中的值添加到DF2中的特定行中 - DataFrame, adding value from DF1 in specific row in DF2 if specific columns value is in DF1 df1 中所有不在 df2 中的行 - all rows in df1 that are NOT in df2 将 Pandas df1 的每一行与 df2 中的每一行进行比较,并从最接近的匹配列返回字符串值 - Compare each row of Pandas df1 with every row within df2 and return string value from closest matching column 将 pandas 数据帧 (df1) 行值匹配到另一个数据帧 (df2) 列并更新数据帧 (Df1) 中不同列的行 - Match a pandas Data frame (df1) row value to another Data frame (df2) column and update a rows of different column in Data frame (Df1)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM