![](/img/trans.png)
[英]How to update the data frame column values from another data frame based a conditional match in polars?
[英]How to update the data frame column values from another data frame based a conditional match in pandas
我有兩個數據框:
df_A:
{'last_name': {0: 'Williams', 1: 'Henry', 2: 'XYX', 3: 'Smith', 4: 'David', 5: 'Freeman', 6: 'Walter', 7: 'Test_A', 8: 'Mallesham', 9: 'Mallesham', 10: 'Henry', 11: 'Smith'}, 'first_name': {0: 'Henry', 1: 'Williams', 2: 'ABC', 3: 'David', 4: 'Smith', 5: 'Walter', 6: 'Freeman', 7: 'Test_B', 8: 'Yamulla', 9: 'Yamulla', 10: 'Williams', 11: 'David'}, 'full_name': {0: 'Williams Henry', 1: 'Henry Williams', 2: 'XYX ABC', 3: 'Smith David', 4: 'David Smith', 5: 'Freeman Walter', 6: 'Walter Freeman', 7: 'Test_A Test_B', 8: 'Mallesham Yamulla', 9: 'Mallesham Yamulla', 10: 'Henry Williams', 11: 'Smith David'}, 'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-11', 2: 'NAME_GROUP-12', 3: 'NAME_GROUP-13', 4: 'NAME_GROUP-13', 5: 'NAME_GROUP-14', 6: 'NAME_GROUP-14', 7: 'NAME_GROUP-15', 8: 'NAME_GROUP-16', 9: 'NAME_GROUP-16', 10: 'NAME_GROUP-11', 11: 'NAME_GROUP-13'}}
last_name first_name full_name name_unique_identifier
0 Williams Henry Williams Henry NAME_GROUP-11
1 Henry Williams Henry Williams NAME_GROUP-11
2 XYX ABC XYX ABC NAME_GROUP-12
3 Smith David Smith David NAME_GROUP-13
4 David Smith David Smith NAME_GROUP-13
5 Freeman Walter Freeman Walter NAME_GROUP-14
6 Walter Freeman Walter Freeman NAME_GROUP-14
7 Test_A Test_B Test_A Test_B NAME_GROUP-15
8 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
9 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
10 Henry Williams Henry Williams NAME_GROUP-11
11 Smith David Smith David NAME_GROUP-13
df_B:
{'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-13', 2: 'NAME_GROUP-14'}, 'full_name': {0: 'Henry Williams', 1: 'Smith David', 2: 'Freeman Walter'}, 'last_name': {0: 'Henry', 1: 'Smith', 2: 'Freeman'}, 'first_name': {0: 'Williams', 1: 'David', 2: 'Walter'}}
name_unique_identifier full_name last_name first_name
0 NAME_GROUP-11 Henry Williams Henry Williams
1 NAME_GROUP-13 Smith David Smith David
2 NAME_GROUP-14 Freeman Walter Freeman Walter
在這里,只要name_unique_identifier
存在於df_A
和df_B
中, df_A
dataframe 列的last_name,first_name
要填寫df_B
last_name,first_name
,不匹配的條目不需要更新。
例子:
NAME_GROUP-14
存在於df_A
和df_B
中。 因此,該標識符的df_A
中的last_name
和first_name
應為“Freeman”、“Walter”。
當我處理數百萬條記錄時,需要一種有效的技術。
您可以檢查 df_B 中column=name_unique_identifier
df_B
的每個唯一值,其中存在於df_A
中,然后將值從df_B
插入到df_A
。
col = 'name_unique_identifier'
for val in df_B[col]:
msk_A = df_A[col].eq(val)
msk_B = df_B[col].eq(val)
df_A.loc[msk_A, ['last_name', 'first_name']] = df_B.loc[msk_B, ['last_name', 'first_name']].values
# If you want to update 'full_name' base new values of 'last_name' and 'first_name'
df_A['full_name'] = df_A['last_name'] + " " + df_A['first_name']
print(df_A)
last_name first_name full_name name_unique_identifier
0 Williams Henry Williams Henry NAME_GROUP-11
1 Henry Williams Henry Williams NAME_GROUP-11
2 XYX ABC XYX ABC NAME_GROUP-12
3 Smith David Smith David NAME_GROUP-13
4 David Smith David Smith NAME_GROUP-13
5 Freeman Walter Freeman Walter NAME_GROUP-14
6 Freeman Walter Freeman Walter NAME_GROUP-14
7 Test_A Test_B Test_A Test_B NAME_GROUP-15
8 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
9 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
10 Henry Williams Henry Williams NAME_GROUP-11
11 Smith David Smith David NAME_GROUP-13
此 pandas 解決方案可能適合您:
df_A = pd.DataFrame({'last_name': {0: 'Williams', 1: 'Henry', 2: 'XYX', 3: 'Smith', 4: 'David', 5: 'Freeman', 6: 'Walter', 7: 'Test_A', 8: 'Mallesham', 9: 'Mallesham', 10: 'Henry', 11: 'Smith'}, 'first_name': {0: 'Henry', 1: 'Williams', 2: 'ABC', 3: 'David', 4: 'Smith', 5: 'Walter', 6: 'Freeman', 7: 'Test_B', 8: 'Yamulla', 9: 'Yamulla', 10: 'Williams', 11: 'David'}, 'full_name': {0: 'Williams Henry', 1: 'Henry Williams', 2: 'XYX ABC', 3: 'Smith David', 4: 'David Smith', 5: 'Freeman Walter', 6: 'Walter Freeman', 7: 'Test_A Test_B', 8: 'Mallesham Yamulla', 9: 'Mallesham Yamulla', 10: 'Henry Williams', 11: 'Smith David'}, 'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-11', 2: 'NAME_GROUP-12', 3: 'NAME_GROUP-13', 4: 'NAME_GROUP-13', 5: 'NAME_GROUP-14', 6: 'NAME_GROUP-14', 7: 'NAME_GROUP-15', 8: 'NAME_GROUP-16', 9: 'NAME_GROUP-16', 10: 'NAME_GROUP-11', 11: 'NAME_GROUP-13'}})
df_B = pd.DataFrame({'name_unique_identifier': {0: 'NAME_GROUP-11', 1: 'NAME_GROUP-13', 2: 'NAME_GROUP-14'}, 'full_name': {0: 'Henry Williams', 1: 'Smith David', 2: 'Freeman Walter'}, 'last_name': {0: 'Henry', 1: 'Smith', 2: 'Freeman'}, 'first_name': {0: 'Williams', 1: 'David', 2: 'Walter'}})
df_A.update(pd.merge(df_A, df_B, how='left', on='name_unique_identifier', suffixes=['_x', None]).drop(['last_name_x', 'first_name_x', 'full_name_x'], axis=1))
print(df_A)
結果:
last_name first_name full_name name_unique_identifier
0 Henry Williams Henry Williams NAME_GROUP-11
1 Henry Williams Henry Williams NAME_GROUP-11
2 XYX ABC XYX ABC NAME_GROUP-12
3 Smith David Smith David NAME_GROUP-13
4 Smith David Smith David NAME_GROUP-13
5 Freeman Walter Freeman Walter NAME_GROUP-14
6 Freeman Walter Freeman Walter NAME_GROUP-14
7 Test_A Test_B Test_A Test_B NAME_GROUP-15
8 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
9 Mallesham Yamulla Mallesham Yamulla NAME_GROUP-16
10 Henry Williams Henry Williams NAME_GROUP-11
11 Smith David Smith David NAME_GROUP-13
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.