[英]How to compare 2 Pandas dataframes and add a new column based on comparison
I have two dataframes that require comparing the ticket_id column. 我有两个需要比较ticket_id列的数据框。 If there is a match, I need to add a column to the first dataframe that comes from a column from the second dataframe.
如果存在匹配项,则需要向来自第二个数据帧的列的第一个数据帧添加一列。 If there is no match between the first dataframe and second dataframe, then that means there is a new row in the second dataframe that needs to be added to the first.
如果第一个数据帧和第二个数据帧之间没有匹配项,则意味着第二个数据帧中有一个新行需要添加到第一个。
I have tried using if statements but have unsuccessfully been able to compare them. 我尝试使用if语句,但是无法比较它们。
df_A (current week help desk ticket report) df_A(本周帮助台故障单报告)
ticket_id category submitted closed status
1 critical 4/20/19 5/1/19 closed
2 low 4/23/19 5/2/19 closed
3 medium 4/26/19 open
4 low 5/1/19 open
df_B (previous week help desk ticket report) df_B(上周服务台票证报告)
ticket_id category submitted closed status
1 critical 4/20/19 open
2 low 4/23/19 open
3 medium 4/26/19 open
So I essentially want to make a new dataframe based on df_A but take the previous week status for that ticket ID and add it to the new dataframe as the last column. 因此,我本质上想基于df_A创建一个新的数据框,但采用该票证ID的前一周状态并将其添加到新数据框中作为最后一列。 If a new ticket appears from previous week to new week (ie ticket_id = 4) then it should be appended and there should be a status of NA or blank (doesn't really matter).
如果从上周到新周出现了新票证(即,ticket_id = 4),则应附加该票证,并且其状态应为NA或为空白(无关紧要)。
expected df_A 预期的df_A
ticket_id category submitted closed status previous_week_status
1 critical 4/20/19 5/1/19 closed open
2 low 4/23/19 5/2/19 closed open
3 medium 4/26/19 open open
4 low 5/1/19 open NA
This should do: 应该这样做:
df_A.set_index('ticket_id', inplace=True)
df_B.set_index('ticket_id', inplace=True)
df_A['previous week status']=df_B.status
As @Erfan already indicated, it's probably best to solve this by renaming and merging the dataframe. 正如@Erfan已经指出的那样,最好通过重命名和合并数据框来解决此问题。
df_B_reduced=(df_B.rename(columns={"status":"previous_week_status"})
.drop(["category","submitted", "closed"]) # drop duplicate info
)
df_merged=df_A.merge(right=df_B_reduced,
how='left', # if an entry is in A, but not in B, add NA values
on=["ticket_id"], # property to merge on
validate="one_to_one" # (optional) Check that your ticket_id is actually a unique id
)
For some more information, look at Pandas Merging 101 or the official documentation . 有关更多信息,请参阅Pandas Merging 101或官方文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.