简体   繁体   中英

Fuzzy match strings in one column and create new dataframe using fuzzywuzzy

I have the following dataframe:

df = pd.DataFrame(
    {'id': [1, 2, 3, 4, 5, 6], 
     'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango']
    })
   id      fruits
0   1       apple
1   2      apples
2   3      orange
3   4  apple tree
4   5     oranges
5   6       mango

I hope to find fuzzy strings in column fruits and get a new dataframe as follows, which ratio_score is higher than 80.

How could do that in Python using fuzzywuzzy packages? Thanks. Please note ratio_score are a serie of values made-up as example.

My solution:

df.loc[:,'fruits_copy'] = df['fruits']
df['ratio_score'] = df[['fruits', 'fruits_copy']].apply(lambda row: fuzz.ratio(row['fruits'], row['fruits_copy']), axis=1) 

Expected result:

     id      fruits    matched_id     matched_fruits   ratio_score   
0     1       apple        2                apples           95
1     1       apple        4            apple tree           85     
2     2      apples        4            apple tree           80   
3     3      orange        5               oranges           95     
4     6       mango         

Reference related:

Fuzzy matching a sorted column with itself using python

Apply fuzzy matching across a dataframe column and save results in a new column

How do I fuzzy match items in a column of an array in python?

Using fuzzywuzzy to create a column of matched results in the data frame

My solution with references below: Apply fuzzy matching across a dataframe column and save results in a new column

df.loc[:,'fruits_copy'] = df['fruits']

compare = pd.MultiIndex.from_product([df['fruits'],
                                      df['fruits_copy']]).to_series()

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])

compare.apply(metrics)

                       ratio  token
apple      apple         100    100
           apples         91     91
           orange         36     36
           apple tree     67     67
           oranges        33     33
           mango          20     20
apples     apple          91     91
           apples        100    100
           orange         33     33
           apple tree     62     62
           oranges        46     46
           mango          18     18
orange     apple          36     36
           apples         33     33
           orange        100    100
           apple tree     25     25
           oranges        92     92
           mango          55     55
apple tree apple          67     67
           apples         62     62
           orange         25     25
           apple tree    100    100
           oranges        24     24
           mango          13     13
oranges    apple          33     33
           apples         46     46
           orange         92     92
           apple tree     24     24
           oranges       100    100
           mango          50     50
mango      apple          20     20
           apples         18     18
           orange         55     55
           apple tree     13     13
           oranges        50     50
           mango         100    100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM