[英]Compare two dataframes with different format column values
I have two dataframes我有两个数据框
df1: df1:
AccountNo![]() |
name![]() |
a/ctype ![]() |
---|---|---|
11.22.21 ![]() |
Henry![]() |
checking![]() |
11.22.22 ![]() |
Sam![]() |
Saving.![]() |
11.22.23 ![]() |
John![]() |
Checking![]() |
df2: df2:
AccountNo![]() |
name![]() |
a/ctype ![]() |
---|---|---|
11-22-21 ![]() |
Henry![]() |
checking![]() |
11-22-23 ![]() |
John![]() |
Checking![]() |
11-22-24 ![]() |
Rita![]() |
Checking![]() |
output: df3:输出:df3:
A/cNO_df1 ![]() |
A/cNO_df2 ![]() |
result.![]() |
Name_df1![]() |
Name_df2![]() |
result![]() |
a/ctype_df1 ![]() |
a/ctype_df2 ![]() |
result.![]() |
---|---|---|---|---|---|---|---|---|
11.22.21 ![]() |
11-22-21 ![]() |
Match![]() |
Henry![]() |
Henry![]() |
Match.![]() |
checking![]() |
checking![]() |
Match.![]() |
11.22.22 ![]() |
Notindf2 ![]() |
Sam.![]() |
Notindf2 ![]() |
checking![]() |
Notindf2 ![]() |
|||
11.22.23 ![]() |
11-22-23 ![]() |
Match![]() |
John.![]() |
john![]() |
Match.![]() |
checking![]() |
checking![]() |
Match.![]() |
. ![]() |
11-22-24 ![]() |
Notindf1 ![]() |
. ![]() |
Rita![]() |
Notindf1 ![]() |
checking![]() |
Notindf2 ![]() |
I tried removing the non numeric character for the accounts to compare both data set using: df1['AccountNo'] = df1.AccountNo.replace(regex=[r'\D+', value='') df2['AccountNo'] = df2.AccountNo.replace(regex=[r'\D+', value='')我尝试删除帐户的非数字字符以使用以下方法比较两个数据集: df1['AccountNo'] = df1.AccountNo.replace(regex=[r'\D+', value='') df2['AccountNo'] = df2.AccountNo.replace(正则表达式=[r'\D+', value='')
And then concat two dataframes.然后连接两个数据帧。 But, When I remove the character I cannot print it in the same format and for ac not in df1 and ac not in df2 I am not able to concat that.
但是,当我删除字符时,我无法以相同的格式打印它,并且对于不在 df1 中的 ac 和不在 df2 中的 ac 我无法连接它。 I tried using numpy where to compare and concat.
我尝试使用 numpy where 来比较和连接。
Is there a way it can be done?有没有办法可以做到?
You can merge with an external Series as key:您可以与外部系列合并为键:
df1.merge(df2, left_on='AccountNo', right_on=df2['AccountNo'].str.replace('-', '.'),
suffixes=('_df1', '_df2'), how='outer')
output:输出:
AccountNo AccountNo_df1 name_df1 a/ctype_df1 AccountNo_df2 name_df2 a/ctype_df2
0 11.22.21 11.22.21 Henry checking 11-22-21 Henry checking
1 11.22.22 11.22.22 Sam Saving NaN NaN NaN
2 11.22.23 11.22.23 John Checking 11-22-23 John Checking
3 11.22.24 NaN NaN NaN 11-22-24 Rita Checking
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.