简体   繁体   English

Python Pandas - 如何将 dataframe 的两列的值与另一个 Dataframe 的列进行比较?

[英]Python Pandas - How to compare values from two columns of a dataframe to another Dataframe columns?

I have two dataframes which I need to compare between two columns based on condition and print the output. For example:我有两个数据帧,我需要根据条件在两列之间进行比较并打印 output。例如:

df1: df1:

| ID    | Date      | value  |
| 248   | 2021-10-30| 4.5    |
| 249   | 2021-09-21| 5.0    |
| 100   | 2021-02-01| 3,2    |

df2: df2:

| ID    | Date      | value  |
| 245   | 2021-12-14| 4.5    |
| 246   | 2021-09-21| 5.0    |
| 247   | 2021-10-30| 3,2    |
| 248   | 2021-10-30| 3,1    |
| 249   | 2021-10-30| 2,2    |
| 250   | 2021-10-30| 6,3    |
| 251   | 2021-10-30| 9,1    |
| 252   | 2021-10-30| 2,0    |

I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,我想编写一个代码来比较两个数据帧之间的 ID 列和日期列是否具有如下条件,

  • if "ID and date is matching from df1 to df2": print(df1['compare'] = 'Both matching')如果“ID 和日期从 df1 匹配到 df2”:print(df1['compare'] = 'Both matching')

  • if "ID is matching and date is not matching from df1 to df2": print(df1['compare'] = 'Date not matching')如果“ID 匹配并且日期从 df1 到 df2 不匹配”:print(df1['compare'] = 'Date not matching')

  • if "ID is Not matching from df1 to df2": print(df1['compare'] = 'ID not available')如果“ID 从 df1 到 df2 不匹配”:print(df1['compare'] = 'ID 不可用')

My result df1 should look like below:我的结果df1应该如下所示:

df1 (expected result): df1(预期结果):

| ID    | Date      | value  | compare
| 248   | 2021-10-30| 4.5    | Both matching
| 249   | 2021-09-21| 5.0    | Id matching - Date not matching
| 100   | 2021-02-01| 3,2    | Id not available

how to do this with Python pandas dataframe?如何用 Python pandas dataframe 做到这一点?

What I suggest you do is to use iterrows .我建议你做的是使用iterrows It might not be the best idea, but still can solve your problem:这可能不是最好的主意,但仍然可以解决您的问题:

compareColumn = []
for index, row in df1.iterrows():
  df2Row = df2[df2["ID"] == row["ID"]]
  if df2Row.shape[0] == 0:
    compareColumn.append("ID not available")
  else:
    check = False
    for jndex, row2 in df2Row.iterrows():
      if row2["Date"] == row["Date"]:
        compareColumn.append("Both matching")
        check = True
        break
    if check == False:
      compareColumn.append("Date not matching")
df1["compare"] = compareColumn
df1

Output Output

ID ID Date日期 value价值 compare相比
0 0 248 248 2021-10-30 2021-10-30 4.5 4.5 Both matching两者匹配
1 1个 249 249 2021-09-21 2021-09-21 5 5个 Date not matching日期不匹配
2 2个 100 100 2021-02-01 2021-02-01 3.2 3.2 ID not available身份证件不可用

suppose 'ID' column is the index, then we can do like this:假设 'ID' 列是索引,那么我们可以这样做:

def f(x):
    if x.name in df2.index:
        return 'Both matching' if x['Date']==df2.loc[x.name,'Date'] else 'Date not matching'
    return 'ID not available'

df1 = df1.assign(compare=df1.apply(f,1))

print(df1)

           Date value            compare
ID                                      
248  2021-10-30   4.5      Both matching
249  2021-09-21   5.0  Date not matching
100  2021-02-01   3,2   ID not available

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM