简体   繁体   中英

Appropriate matching of the string as per scoring using fuzzywuzzy and python3.6

I am trying to match the string using the fuzzy logic library fuzzywuzzy in my python application. I found that the fuzzywuzzy is not giving the appropriate results even after the scoring is equal, it is listing the wrong result in the first position.
Here is the example:

>>> d = ['John Welsh','Patrick Walsh','Jonathan Walsh']
>>> e = process.extract('jwalsh', d)
>>> e = sorted(e,key=lambda k:k[1],reverse=True)
>>> e
[('Patrick Walsh', 75), ('Jonathan Walsh', 75), ('John Welsh', 62)]

As one can see, the string is jwalsh and the most appropriate result is Jonathan Walsh , which should be on the first position in the result, where as it is second.
Kindly, suggest me how I can correct the results and display the most appropriate result. As this is the case with the same scoring, there is a case where one can get the appropriate result with less scoring.
What can I do to get the best output? Is there any suggestion apart from the fuzzywuzzy ? Do let me know.

This is similar to another question I answered recently .

Since you aren't specifying a scorer, process.extract defaults to using fuzz.WRatio. Since the ratio of the length of your choices over the length of you query is at least 1.66 (10/6), WRatio allows the use of fuzz.partial_ratio which gives the same score to 'Patrick Walsh' and 'Jonathan Walsh' since they both include the string ' Walsh'.

To fix this, you should consider using an average (or weighted average) of two or more different scorers. Ex:

x = process.extract('jwalsh',d,scorer=fuzz.ratio)
[('John Welsh', 62), ('Jonathan Walsh', 60), ('Patrick Walsh', 53)]

y = process.extract('jwalsh',d,scorer=fuzz.partial_ratio)
[('Patrick Walsh', 83), ('Jonathan Walsh', 83), ('John Welsh', 67)]

I'm a little rusty dealing with tuples so I don't have the exact code to average these together, but a straight average of these scores would give:

[('Patrick Walsh', 68), ('Jonathan Walsh', 71.5), ('John Welsh', 64.5)]

Which specifies the correct answer in this case. Obviously, with more variation in the queries and choices you might need to adjust the scorers used and the weights in the average, but this should point you in the right direction.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM