I have two dataframes that should have the same data but come from different sources. I would like to return the column names from df1 and the corresponding match rate for that column when compared against the equivalent in df2.
Inputs:
df1 =
ID Age Value Name
1 10 1000 Red
2 20 2000 Blue
3 30 3000 Orange
4 40 4000 Grey
df2 =
Age_2 Value_2 Name_2
10 1000 red
20 1500 blue
30 3000 orange
40 4000 white
Desired output:
Name MatchRate
ID N/A
Age 1.00
Value 0.75
Name 0.75
I suggest using difflib.SequenceMatcher
for comparing strings for example following way
import difflib
import pandas as pd
def get_ratio(x,y):
return difflib.SequenceMatcher(None,x,y).ratio()
df = pd.DataFrame({"col1":["Red","Blue","Orange","Grey","White"],"col2":["red","blue","orange","grey","black"]})
df["ratio"] = df.apply(lambda row:get_ratio(row.col1,row.col2),axis=1)
print(df)
gives output
col1 col2 ratio
0 Red red 0.666667
1 Blue blue 0.750000
2 Orange orange 0.833333
3 Grey grey 0.750000
4 White black 0.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.