[英]Calculate matching percentage of 2 dataframes in python
在df1
上加入df2
時,使用First_Name
、 Last_Name
和Email
,如何計算可以匹配到df1
的df2
百分比?
df1:
First_Name Last_Name Email Value1
0 Aaron Potter aaronpotter@gmail.com 10
1 Bella Granger bellagranger@gmail.com 2
2 Ron Black black@hotmail.com 20
3 Harry Weasley harryweasley@hotmail.com 11
df2
:
First_Name Last_Name Email Value2
0 Aaron Potter aaronpotter@gmail.com 10
1 Ronald Black ronaldblack@hotmail.com 5
2 Bella Granger bellagranger@gmail.com 2
3 Harry Weasley tomriddle@hotmail.com 20
例如,在這種情況下,匹配百分比為 4 分之 2。
@anky 有一個很好的解決方案。 我將在merge
中提供indicator
參數以直觀地查看匹配項。
df_out = df1.merge(df2, on = ['First_Name', 'Last_Name', 'Email'],
indicator='Matched', how='out')
df_out
Output:
First_Name Last_Name Email Value1 Value2 Matched
0 Aaron Potter aaronpotter@gmail.com 10.0 10.0 both
1 Bella Granger bellagranger@gmail.com 2.0 2.0 both
2 Ron Black black@hotmail.com 20.0 NaN left_only
3 Harry Weasley harryweasley@hotmail.com 11.0 NaN left_only
4 Ronald Black ronaldblack@hotmail.com NaN 5.0 right_only
5 Harry Weasley tomriddle@hotmail.com NaN 20.0 right_only
或者,左連接:
df_out = df1.merge(df2, on = ['First_Name', 'Last_Name', 'Email'],
indicator='Matched', how='left')
print(df_out)
Output:
First_Name Last_Name Email Value1 Value2 Matched
0 Aaron Potter aaronpotter@gmail.com 10 10.0 both
1 Bella Granger bellagranger@gmail.com 2 2.0 both
2 Ron Black black@hotmail.com 20 NaN left_only
3 Harry Weasley harryweasley@hotmail.com 11 NaN left_only
並使用@anky 的解決方案:
(df_out['Matched'] == 'both').sum()/df_out.shape[0]
Output:
0.5
@Scott Boston 的答案是完美的,如果您只有“First_Name”、“Last_Name”和“Email”。 您可以使用以下代碼。
df = pd.concat([df1[['First_Name','Last_Name','Email']],df2[['First_Name','Last_Name','Email']]])
df = df.reset_index(drop=True)
gb = df.groupby(list(df.columns))
idx = [x[0] for x in gb.groups.values() if len(x) == 2]
df.reindex(idx)
First_Name Last_Name Email
0 Aaron Potter aaronpotter@gmail.com
1 Bella Granger bellagranger@gmail.com
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.