简体   繁体   中英

Extract similar words from a corpus

I want to extract similar words from a corpus. The similarity is based on string. Namely, when the string of two words are highly similar, two words extract as similar words. For example, If the corpus contains: Aras, bahro, arasis, adkpo, bah, aras sd, kio.

Similar words:

1- aras, arasis, aras sd

2- bahro, bah

how to solve this problem? Thanks.

Levenshtein距离是用于测量两个单词序列之间的差异的度量,也许您可​​以采用一个单词序列并计算距离以了解它们是否相似。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM