![](/img/trans.png)
[英]How to update a column in pandas DataFrame based on column from another DataFrame
[英]how to do lookup on two pandas dataframe and update its value in first dataframe column from another dataframe column?
我有 2 个数据帧 df 和 df1 -
df-
|system_time|status|id|date|
|2022-03-04T07:52:26Z|Pending|772|2022-03-04 07:52:26+00:00|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-13T01:34:44Z|Pending|1052|2022-08-13 01:34:44+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Pending|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|
df1-
|id|task_status|
|1052|Complete|
|889|Pending|
|772|Complete|
|963|Pending|
df 中的列类型 -
system_time - object
status - object
id - int64
date - object
我想在此处应用从 df 到 df1 的查找。 如果 df 和 df1 中的 id 匹配,df 中的状态应该是 df1 中的 task_status。 由于 df 中有重复记录,需要获取最新记录并更新 df1 的状态,否则对于不匹配的 id,保持与 df 相同的状态。 在 df 中,我使用 - 将 system_time 转换为日期列
df['date']=pd.to_datetime(df['system_time'])
预计 output -
|system_time|status|id|date|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Complete|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|
这是使用 map 的一种方法
# map the df1 status to df when ID is found
df['status']=df['i'].map(df1.set_index(['id'])['task_status'])
df
system_time status id date
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00
4 2022-08-14T06:04:54.736Z NaN 1053 2022-08-14 06:04:54.736000+00:00
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00
6 2022-08-24T06:24:54.736Z NaN 999 2022-08-24 06:24:54.736000+00:00
或者,如果您只想在 DF1 中找到状态时更新
df['status']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).notna()),
(df['id'].map(df1.set_index(['id'])['task_status'])) )
df
system_time status id date
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00
对于无与伦比的,
#update 'unmatched' column as unmatched when id is NOT found.
#When its found, keep the status as-is.
#you may want to keep the previous one and this one together
df['unmatched']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).isna()),
'unmatched' )
df
system_time status id date unmatched
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00 Complete
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00 Pending
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00 Complete
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00 Complete
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00 unmatched
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00 Complete
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00 unmatched
根据系统时间保留最后一行
df.sort_values('system_time').drop_duplicates(subset=['id'], keep='last')
system_time status id date unmatched
5 2022-03-04T17:51:15.025Z Pending 772 2022-03-04 17:51:15.025000+00:00 Pending
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00 Pending
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00 unmatched
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00 Complete
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00 unmatched
您需要使用merge
:
df = df.merge(right=df1, on='id',how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.