简体   繁体   English

从语料库中提取相似的词

[英]Extract similar words from a corpus

I want to extract similar words from a corpus. 我想从语料库中提取相似的词。 The similarity is based on string. 相似性基于字符串。 Namely, when the string of two words are highly similar, two words extract as similar words. 即,当两个单词的字符串高度相似时,两个单词提取为相似单词。 For example, If the corpus contains: Aras, bahro, arasis, adkpo, bah, aras sd, kio. 例如,如果语料库包含:Aras,bahro,arasis,adkpo,bah,aras sd,kio。

Similar words: 相似词:

1- aras, arasis, aras sd 1- aras,arasis,aras SD

2- bahro, bah 2- bahro,bah

how to solve this problem? 如何解决这个问题呢? Thanks. 谢谢。

Levenshtein距离是用于测量两个单词序列之间的差异的度量,也许您可​​以采用一个单词序列并计算距离以了解它们是否相似。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 nltk 从大文本语料库中只提取英文单词? - How to extract only English words from a from big text corpus using nltk? 查找和排序与文档语料库中的特定单词列表最相似的 - Find and sort most similar to a list of specific words to a corpus of documents 如何从语料库中删除无意义的单词? - How to remove meaningless words from corpus? 如何使用gensim从语料库中提取短语 - How to extract phrases from corpus using gensim 如何从语料库中发现与另一个语料库不同的单词列表? 蟒蛇 - How do I discover list of words from corpus which distinguish from another corpus? Python 如何从 word2vec 的语料库中找到相似的句子? - How to find similar sentence from a corpus on word2vec? 如何从单词语料覆盖原始语料库中最大句子的句子语料库中获取最小句子? - How to get minimum sentences from sentences corpus whose words covers the maximum sentences in the original corpus? 如何从语料库中删除无意义或不完整的单词? - How do I remove nonsensical or incomplete words from a corpus? 如何从原始语料库中获取特定范围的单词? - How to get specific ranged words from raw corpus? WordNet 语料库中的单词澄清 - Words in WordNet corpus clarification
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM