![](/img/trans.png)
[英]How to update a column in pandas DataFrame based on column from another DataFrame
[英]how to do lookup on two pandas dataframe and update its value in first dataframe column from another dataframe column?
我有 2 個數據幀 df 和 df1 -
df-
|system_time|status|id|date|
|2022-03-04T07:52:26Z|Pending|772|2022-03-04 07:52:26+00:00|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-13T01:34:44Z|Pending|1052|2022-08-13 01:34:44+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Pending|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|
df1-
|id|task_status|
|1052|Complete|
|889|Pending|
|772|Complete|
|963|Pending|
df 中的列類型 -
system_time - object
status - object
id - int64
date - object
我想在此處應用從 df 到 df1 的查找。 如果 df 和 df1 中的 id 匹配,df 中的狀態應該是 df1 中的 task_status。 由於 df 中有重復記錄,需要獲取最新記錄並更新 df1 的狀態,否則對於不匹配的 id,保持與 df 相同的狀態。 在 df 中,我使用 - 將 system_time 轉換為日期列
df['date']=pd.to_datetime(df['system_time'])
預計 output -
|system_time|status|id|date|
|2022-06-22T17:52:42Z|Pending|963|2022-06-22 17:52:42+00:00|
|2022-08-24T01:46:31.115Z|Complete|1052|2022-08-24 01:46:31.115000+00:00|
|2022-08-14T06:04:54.736Z|Pending|1053|2022-08-14 06:04:54.736000+00:00|
|2022-03-04T17:51:15.025Z|Complete|772|2022-03-04 17:51:15.025000+00:00|
|2022-08-24T06:24:54.736Z|Inprogress|999|2022-08-24 06:24:54.736000+00:00|
這是使用 map 的一種方法
# map the df1 status to df when ID is found
df['status']=df['i'].map(df1.set_index(['id'])['task_status'])
df
system_time status id date
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00
4 2022-08-14T06:04:54.736Z NaN 1053 2022-08-14 06:04:54.736000+00:00
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00
6 2022-08-24T06:24:54.736Z NaN 999 2022-08-24 06:24:54.736000+00:00
或者,如果您只想在 DF1 中找到狀態時更新
df['status']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).notna()),
(df['id'].map(df1.set_index(['id'])['task_status'])) )
df
system_time status id date
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00
對於無與倫比的,
#update 'unmatched' column as unmatched when id is NOT found.
#When its found, keep the status as-is.
#you may want to keep the previous one and this one together
df['unmatched']=df['status'].mask((df['id'].map(df1.set_index(['id'])['task_status']).isna()),
'unmatched' )
df
system_time status id date unmatched
0 2022-03-04T07:52:26Z Complete 772 2022-03-04 07:52:26+00:00 Complete
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00 Pending
2 2022-08-13T01:34:44Z Complete 1052 2022-08-13 01:34:44+00:00 Complete
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00 Complete
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00 unmatched
5 2022-03-04T17:51:15.025Z Complete 772 2022-03-04 17:51:15.025000+00:00 Complete
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00 unmatched
根據系統時間保留最后一行
df.sort_values('system_time').drop_duplicates(subset=['id'], keep='last')
system_time status id date unmatched
5 2022-03-04T17:51:15.025Z Pending 772 2022-03-04 17:51:15.025000+00:00 Pending
1 2022-06-22T17:52:42Z Pending 963 2022-06-22 17:52:42+00:00 Pending
4 2022-08-14T06:04:54.736Z Pending 1053 2022-08-14 06:04:54.736000+00:00 unmatched
3 2022-08-24T01:46:31.115Z Complete 1052 2022-08-24 01:46:31.115000+00:00 Complete
6 2022-08-24T06:24:54.736Z Inprogress 999 2022-08-24 06:24:54.736000+00:00 unmatched
您需要使用merge
:
df = df.merge(right=df1, on='id',how='left')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.