簡體 English 中英

從語料庫中提取相似的詞

[英]Extract similar words from a corpus

原文 2014-08-28 05:30:47 1 1 nlp/ string-matching/ similarity/ text-extraction/ approximate

我想從語料庫中提取相似的詞。 相似性基於字符串。 即，當兩個單詞的字符串高度相似時，兩個單詞提取為相似單詞。 例如，如果語料庫包含：Aras，bahro，arasis，adkpo，bah，aras sd，kio。

相似詞：

1- aras，arasis，aras SD

2- bahro，bah

如何解決這個問題呢？ 謝謝。

1 個解決方案

Levenshtein距離是用於測量兩個單詞序列之間的差異的度量，也許您可以采用一個單詞序列並計算距離以了解它們是否相似。

如何使用 nltk 從大文本語料庫中只提取英文單詞？

[英]How to extract only English words from a from big text corpus using nltk?

查找和排序與文檔語料庫中的特定單詞列表最相似的

[英]Find and sort most similar to a list of specific words to a corpus of documents

如何從語料庫中刪除無意義的單詞？

[英]How to remove meaningless words from corpus?

如何使用gensim從語料庫中提取短語

[英]How to extract phrases from corpus using gensim

如何從語料庫中發現與另一個語料庫不同的單詞列表？蟒蛇

[英]How do I discover list of words from corpus which distinguish from another corpus? Python

如何從 word2vec 的語料庫中找到相似的句子？

[英]How to find similar sentence from a corpus on word2vec?

如何從單詞語料覆蓋原始語料庫中最大句子的句子語料庫中獲取最小句子？

[英]How to get minimum sentences from sentences corpus whose words covers the maximum sentences in the original corpus?

如何從語料庫中刪除無意義或不完整的單詞？

[英]How do I remove nonsensical or incomplete words from a corpus?

如何從原始語料庫中獲取特定范圍的單詞？

[英]How to get specific ranged words from raw corpus?

WordNet 語料庫中的單詞澄清

[英]Words in WordNet corpus clarification

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何使用 nltk 從大文本語料庫中只提取英文單詞？查找和排序與文檔語料庫中的特定單詞列表最相似的如何從語料庫中刪除無意義的單詞？如何使用gensim從語料庫中提取短語如何從語料庫中發現與另一個語料庫不同的單詞列表？蟒蛇如何從 word2vec 的語料庫中找到相似的句子？如何從單詞語料覆蓋原始語料庫中最大句子的句子語料庫中獲取最小句子？如何從語料庫中刪除無意義或不完整的單詞？如何從原始語料庫中獲取特定范圍的單詞？ WordNet 語料庫中的單詞澄清

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM