I am trying to use string fuzzy-matching with both R and Python. I am actually using two packages:
stringdist
from Rfuzzywuzzy
from Python When I try amatch("PARI", c("HELLO", "WORLD"), maxDist = 2)
on R, I get NA
as a result, which is intuitive. But when I try the same thing with Python : process.extract("PARI", ["HELLO", "WORLD"], limit = 2)
, I get [('world', 22), ('HELLO', 0)]
Could anyone tell me why I have a 22 as a ratio matching between "PARI" and "WORLD" ? How could I get the same result as in R ? Thanks in advance
The problem here is limit = 2
specifically says you want 2 results regardless of the score, whereas in R you are specifying that you only want a result if the strings are very close to one another. The score here is a measure from 0 to 100 of how similar the words are. You can see PARI
and world
both have R
as their third letter, which is why you get a non-zero score, but it still isn't a very good one.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.