I have 2 queries:
query1:你好世界
query2:你好
When i run this code using the python library Levenshtein:
from Levenshtein import distance, hamming, median
lev_edit_dist = distance(query1,query2)
print lev_edit_dist
I get an output of 12. Now the question is how is the value 12 derived?
Because in terms of strokes difference, theres definitely more than 12.
According to its documentation , it supports unicode:
It supports both normal and Unicode strings, but can't mix them, all arguments to a function (method) have to be of the same type (or its subclasses).
You need to make sure the Chinese characters are in unicode though:
In [1]: from Levenshtein import distance, hamming, median
In [2]: query1 = '你好世界'
In [3]: query2 = '你好'
In [4]: print distance(query1,query2)
6
In [5]: print distance(query1.decode('utf8'),query2.decode('utf8'))
2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.