如何用简体中文字符计算Levenshtein距离？

Question

I have 2 queries: 我有2个查询：

    query1:你好世界
    query2:你好

When i run this code using the python library Levenshtein: 当我使用python库Levenshtein运行此代码时：

from Levenshtein import distance, hamming, median
lev_edit_dist = distance(query1,query2)
print lev_edit_dist

I get an output of 12. Now the question is how is the value 12 derived? 我得到的输出为12。现在的问题是值12是如何得出的？

Because in terms of strokes difference, theres definitely more than 12. 因为就笔画差异而言，肯定超过12。

Answer 1

According to its documentation , it supports unicode: 根据其文档，它支持unicode：

It supports both normal and Unicode strings, but can't mix them, all arguments to a function (method) have to be of the same type (or its subclasses). 它同时支持普通字符串和Unicode字符串，但不能混合使用，函数（方法）的所有参数都必须具有相同的类型（或其子类）。

You need to make sure the Chinese characters are in unicode though: 但是，您需要确保中文字符为unicode：

In [1]: from Levenshtein import distance, hamming, median

In [2]: query1 = '你好世界'

In [3]: query2 = '你好'

In [4]: print distance(query1,query2)
6

In [5]: print distance(query1.decode('utf8'),query2.decode('utf8'))
2

如何用简体中文字符计算Levenshtein距离？

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-06-19 00:36:40

如何用简体中文字符计算Levenshtein距离？

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-06-19 00:36:40

解决方案1
4 已采纳 2015-06-19 00:36:40