简体   繁体   English

使用两个条件比较来自两个不同数据框的两列

[英]Compare two columns from two different data frame with two conditions

The context here is that I'm comparing the values of two columns—the key and the date.这里的上下文是我正在比较两列的值——键和日期。 If the criterion is met, we will now create a new column with the flag = Y else ""如果满足条件,我们现在将创建一个新列,其中 flag = Y else ""

Condition: if key are matching and date in df1 > date in df2 then "Y" else ""条件:如果键匹配并且 df1 中的日期 > df2 中的日期则“Y”否则“”

We will therefore iterate through all of the rows in df1 and see if the key matches in df2, at which point we will check dateF and date for that row to see if it is greater, and if it is, we will save "Y" in a new column flag.因此,我们将遍历 df1 中的所有行并查看键是否与 df2 中的匹配,此时我们将检查该行的 dateF 和 date 以查看它是否更大,如果是,我们将保存“Y”在新的列标志中。

Update 1: There can be multiple rows in df1 with same key and different dates更新 1:df1 中可以有多个具有相同键和不同日期的行

Df1: DF1:

Key钥匙 Date日期 Another其他
123 123 2022-03-04 2022-03-04 Apple苹果
321 321 2022-05-01 2022-05-01 Red红色的
234 234 2022-07-08 2022-07-08 Green绿色的

Df2: DF2:

Key钥匙 Date日期
123 123 2022-03-01 2022-03-01
321 321 2022-05-01 2022-05-01
234 234 2022-07-01 2022-07-01

Expected O/P: Explanation: as we can see first row and 3rd row key are matching and the DateF in df1 > Date in df2 so Y预期的 O/P: 解释:正如我们所见,第一行和第三行键是匹配的,df1 中的 DateF > df2 中的日期,所以 Y

Key钥匙 Date日期 Another其他 Flag旗帜
123 123 2022-03-04 2022-03-04 Apple苹果 Y
321 321 2022-05-01 2022-05-01 Red红色的
234 234 2022-07-08 2022-07-08 Green绿色的 Y

Code to create all dfs:创建所有dfs的代码:

import pandas as pd

data = [[123, pd.to_datetime('2022-03-04 '),'Apple'],
[321, pd.to_datetime('2022-05-01 '),'Red'],
[234, pd.to_datetime('2022-07-08 '),'Green']]
df1 = pd.DataFrame(data, columns=['Key', 'DateF', 'Another'])

#df2
data1 = [[123, pd.to_datetime('2022-03-01 ')],
[321, pd.to_datetime('2022-05-01 ')],
[234, pd.to_datetime('2022-07-01 ')]]
df2 = pd.DataFrame(data1, columns=['Key', 'Date'])

Have tried this but i think i am going wrong.试过这个,但我想我错了。

for i in df1.Key.unique():
   df1.loc[(df1[i] == df2[i]) & (r['DateF'] > df2['Date]), "Flag"] = "Y"

Thank You!谢谢你!

You can use pandas.Series.gt to compare the two dates then pandas.DataFrame.loc with a boolean mask to create the new column and flag it at the same time.您可以使用pandas.Series.gt比较两个日期,然后使用pandas.DataFrame.loc和 boolean 掩码来创建新列并同时对其进行标记。

df1.loc[df1['Date'].gt(df2['Date']), "Flag"]= "Y"

# Output: #Output:

print(df1)

   Key       Date Another Flag
0  123 2022-03-04   Apple    Y
1  321 2022-05-01     Red  NaN
2  234 2022-07-08   Green    Y

You can use merge if your dataframes are not the same size:如果您的数据框大小不同,则可以使用合并:

final=df1.merge(df2,left_on='Key',right_on='Key',how='left')
final.loc[final['DateF'] > final['Date'], "Flag"]="Y"
final=final.drop(['Date'],axis=1)

    Key DateF   Another Flag
0   123 2022-03-04  Apple   Y
1   321 2022-05-01  Red 
2   234 2022-07-08  Green   Y

This code is not as elegant as the answers, but it also works:这段代码不像答案那么优雅,但它也有效:

ref_dates   = dict(zip(df2.Key,df2.Date))
df1['Flag'] = ['Y' if date>ref_dates.get(key,'0000-00-00') else '' for key,date in zip(df1.Key,df1.DateF)]

We first create a dictionary ( ref_dates ) with the dates in df2 , and then iterate over df1 comparing them with DateF .我们首先用df2中的日期创建一个字典 ( ref_dates ),然后遍历df1将它们与DateF进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM