简体   繁体   中英

String fuzzy-matching From R to Python

I am trying to use string fuzzy-matching with both R and Python. I am actually using two packages:

  1. stringdist from R
  2. fuzzywuzzy from Python

When I try amatch("PARI", c("HELLO", "WORLD"), maxDist = 2) on R, I get NA as a result, which is intuitive. But when I try the same thing with Python : process.extract("PARI", ["HELLO", "WORLD"], limit = 2) , I get [('world', 22), ('HELLO', 0)]

Could anyone tell me why I have a 22 as a ratio matching between "PARI" and "WORLD" ? How could I get the same result as in R ? Thanks in advance

The problem here is limit = 2 specifically says you want 2 results regardless of the score, whereas in R you are specifying that you only want a result if the strings are very close to one another. The score here is a measure from 0 to 100 of how similar the words are. You can see PARI and world both have R as their third letter, which is why you get a non-zero score, but it still isn't a very good one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM