[英]Python Pandas - How to compare values from two columns of a dataframe to another Dataframe columns?
I have two dataframes which I need to compare between two columns based on condition and print the output. For example:我有两个数据帧,我需要根据条件在两列之间进行比较并打印 output。例如:
df1: df1:
| ID | Date | value |
| 248 | 2021-10-30| 4.5 |
| 249 | 2021-09-21| 5.0 |
| 100 | 2021-02-01| 3,2 |
df2: df2:
| ID | Date | value |
| 245 | 2021-12-14| 4.5 |
| 246 | 2021-09-21| 5.0 |
| 247 | 2021-10-30| 3,2 |
| 248 | 2021-10-30| 3,1 |
| 249 | 2021-10-30| 2,2 |
| 250 | 2021-10-30| 6,3 |
| 251 | 2021-10-30| 9,1 |
| 252 | 2021-10-30| 2,0 |
I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,我想编写一个代码来比较两个数据帧之间的 ID 列和日期列是否具有如下条件,
if "ID and date is matching from df1 to df2": print(df1['compare'] = 'Both matching')如果“ID 和日期从 df1 匹配到 df2”:print(df1['compare'] = 'Both matching')
if "ID is matching and date is not matching from df1 to df2": print(df1['compare'] = 'Date not matching')如果“ID 匹配并且日期从 df1 到 df2 不匹配”:print(df1['compare'] = 'Date not matching')
if "ID is Not matching from df1 to df2": print(df1['compare'] = 'ID not available')如果“ID 从 df1 到 df2 不匹配”:print(df1['compare'] = 'ID 不可用')
My result df1
should look like below:我的结果df1
应该如下所示:
df1 (expected result): df1(预期结果):
| ID | Date | value | compare
| 248 | 2021-10-30| 4.5 | Both matching
| 249 | 2021-09-21| 5.0 | Id matching - Date not matching
| 100 | 2021-02-01| 3,2 | Id not available
how to do this with Python pandas dataframe?如何用 Python pandas dataframe 做到这一点?
What I suggest you do is to use iterrows
.我建议你做的是使用iterrows
。 It might not be the best idea, but still can solve your problem:这可能不是最好的主意,但仍然可以解决您的问题:
compareColumn = []
for index, row in df1.iterrows():
df2Row = df2[df2["ID"] == row["ID"]]
if df2Row.shape[0] == 0:
compareColumn.append("ID not available")
else:
check = False
for jndex, row2 in df2Row.iterrows():
if row2["Date"] == row["Date"]:
compareColumn.append("Both matching")
check = True
break
if check == False:
compareColumn.append("Date not matching")
df1["compare"] = compareColumn
df1
ID ID | Date日期 | value价值 | compare相比 | |
---|---|---|---|---|
0 0 | 248 248 | 2021-10-30 2021-10-30 | 4.5 4.5 | Both matching两者匹配 |
1 1个 | 249 249 | 2021-09-21 2021-09-21 | 5 5个 | Date not matching日期不匹配 |
2 2个 | 100 100 | 2021-02-01 2021-02-01 | 3.2 3.2 | ID not available身份证件不可用 |
suppose 'ID' column is the index, then we can do like this:假设 'ID' 列是索引,那么我们可以这样做:
def f(x):
if x.name in df2.index:
return 'Both matching' if x['Date']==df2.loc[x.name,'Date'] else 'Date not matching'
return 'ID not available'
df1 = df1.assign(compare=df1.apply(f,1))
print(df1)
Date value compare
ID
248 2021-10-30 4.5 Both matching
249 2021-09-21 5.0 Date not matching
100 2021-02-01 3,2 ID not available
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.