I want to extract similar words from a corpus. The similarity is based on string. Namely, when the string of two words are highly similar, two words extract as similar words. For example, If the corpus contains: Aras, bahro, arasis, adkpo, bah, aras sd, kio.
Similar words:
1- aras, arasis, aras sd
2- bahro, bah
how to solve this problem? Thanks.
Levenshtein距离是用于测量两个单词序列之间的差异的度量,也许您可以采用一个单词序列并计算距离以了解它们是否相似。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.