简体   繁体   English

如何查找两个 pandas dataframe 并从另一个 dataframe 列更新第一个 dataframe 列中的值?

[英]how to do lookup on two pandas dataframe and update its value in first dataframe column from another dataframe column?

I have 2 dataframes df and df1 -我有 2 个数据帧 df 和 df1 -

df- df-

|system_time|status|id|date|
|2022-03-04T07:52:26Z|Pending|772|2022-03-04 07:52:26+00:00|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-13T01:34:44Z|Pending|1052|2022-08-13 01:34:44+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Pending|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|

df1- df1-

|id|task_status|
|1052|Complete|
|889|Pending|
|772|Complete|
|963|Pending|

Type of columns in df - df 中的列类型 -

system_time  - object
status - object
id    - int64
date   - object

I want to apply a lookup here from df into df1.我想在此处应用从 df 到 df1 的查找。 If id matches in df and df1,status in df should be of task_status from df1.如果 df 和 df1 中的 id 匹配,df 中的状态应该是 df1 中的 task_status。 As there are duplicate records in df, need to get the latest record and update the status as of df1 else keep status same as df for unmatched id's.由于 df 中有重复记录,需要获取最新记录并更新 df1 的状态,否则对于不匹配的 id,保持与 df 相同的状态。 In df, I have converted the system_time into date column using -在 df 中,我使用 - 将 system_time 转换为日期列

df['date']=pd.to_datetime(df['system_time'])

Expected output -预计 output -

|system_time|status|id|date|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Complete|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|

here is one way to do it using map这是使用 map 的一种方法

# map the df1 status to df when ID is found
df['status']=df['i'].map(df1.set_index(['id'])['task_status'])
df
    system_time     status  id  date
0   2022-03-04T07:52:26Z    Complete    772     2022-03-04 07:52:26+00:00
1   2022-06-22T17:52:42Z    Pending     963     2022-06-22 17:52:42+00:00
2   2022-08-13T01:34:44Z    Complete    1052    2022-08-13 01:34:44+00:00
3   2022-08-24T01:46:31.115Z    Complete    1052    2022-08-24 01:46:31.115000+00:00
4   2022-08-14T06:04:54.736Z    NaN     1053    2022-08-14 06:04:54.736000+00:00
5   2022-03-04T17:51:15.025Z    Complete    772     2022-03-04 17:51:15.025000+00:00
6   2022-08-24T06:24:54.736Z    NaN     999     2022-08-24 06:24:54.736000+00:00

Alternately, if you like to update only when status is found in DF1或者,如果您只想在 DF1 中找到状态时更新

df['status']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).notna()), 
                           (df['id'].map(df1.set_index(['id'])['task_status'])) )
df
    system_time     status  id  date
0   2022-03-04T07:52:26Z    Complete    772     2022-03-04 07:52:26+00:00
1   2022-06-22T17:52:42Z    Pending     963     2022-06-22 17:52:42+00:00
2   2022-08-13T01:34:44Z    Complete    1052    2022-08-13 01:34:44+00:00
3   2022-08-24T01:46:31.115Z    Complete    1052    2022-08-24 01:46:31.115000+00:00
4   2022-08-14T06:04:54.736Z    Pending     1053    2022-08-14 06:04:54.736000+00:00
5   2022-03-04T17:51:15.025Z    Complete    772     2022-03-04 17:51:15.025000+00:00
6   2022-08-24T06:24:54.736Z    Inprogress  999     2022-08-24 06:24:54.736000+00:00

for unmatched,对于无与伦比的,

#update 'unmatched' column as unmatched when id is NOT found. 
#When its found, keep the status as-is.
#you may want to keep the previous one and this one together

df['unmatched']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).isna()), 
                           'unmatched' )
df
system_time     status  id  date    unmatched
0   2022-03-04T07:52:26Z    Complete    772     2022-03-04 07:52:26+00:00   Complete
1   2022-06-22T17:52:42Z    Pending     963     2022-06-22 17:52:42+00:00   Pending
2   2022-08-13T01:34:44Z    Complete    1052    2022-08-13 01:34:44+00:00   Complete
3   2022-08-24T01:46:31.115Z    Complete    1052    2022-08-24 01:46:31.115000+00:00    Complete
4   2022-08-14T06:04:54.736Z    Pending     1053    2022-08-14 06:04:54.736000+00:00    unmatched
5   2022-03-04T17:51:15.025Z    Complete    772     2022-03-04 17:51:15.025000+00:00    Complete
6   2022-08-24T06:24:54.736Z    Inprogress  999     2022-08-24 06:24:54.736000+00:00    unmatched

to keep the last row based on the system-time根据系统时间保留最后一行

df.sort_values('system_time').drop_duplicates(subset=['id'], keep='last')
system_time     status  id  date    unmatched
5   2022-03-04T17:51:15.025Z    Pending     772     2022-03-04 17:51:15.025000+00:00    Pending
1   2022-06-22T17:52:42Z    Pending     963     2022-06-22 17:52:42+00:00   Pending
4   2022-08-14T06:04:54.736Z    Pending     1053    2022-08-14 06:04:54.736000+00:00    unmatched
3   2022-08-24T01:46:31.115Z    Complete    1052    2022-08-24 01:46:31.115000+00:00    Complete
6   2022-08-24T06:24:54.736Z    Inprogress  999     2022-08-24 06:24:54.736000+00:00    unmatched

You need to use merge :您需要使用merge

df = df.merge(right=df1, on='id',how='left')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一个 DataFrame 中的列更新 Pandas DataFrame 中的列 - How to update a column in pandas DataFrame based on column from another DataFrame 使用来自另一个具有条件的数据帧的值更新熊猫数据帧列 - update pandas dataframe column with value from another dataframe with condition 用它的第一个值减去熊猫数据框中的一列 - Subtract a column in pandas dataframe by its first value 从另一个数据帧中的列值替换pandas数据帧中的列中的值 - Replacing value in a column in pandas dataframe from a column value in another dataframe 熊猫列值从另一个数据框值更新 - pandas column value update from another dataframe value Pandas数据框根据查询数据框中的值选择行,然后根据列值选择其他条件 - Pandas Dataframe Select rows based on values from a lookup dataframe and then another condition based on column value 如何在引用表中查找特定的 pandas 数据框列值并将引用表值复制到数据框中? - How do you lookup a particular pandas dataframe column value in a reference table and copy a reference table value to the dataframe? 比较两个数据帧并更新第一个数据帧中的一列 - Compare two dataframe and update a column in first dataframe 从Pandas中不同数据框中的另一个匹配列更新数据框中的列值 - update a column value in a dataframe from another matching column in different dataframe in Pandas 当来自熊猫中另一个数据框的键匹配时更新数据框的列 - Update column of a dataframe when key matches from another dataframe in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM