使用Word2vec來確定一組詞中最相似的兩個詞

Question

我正在嘗試在Word2vec周圍使用python包裝器。 我有一個單詞嵌入或一組單詞，可以在下面看到，從中我試圖確定哪個單詞彼此最相似。

我怎樣才能做到這一點？

[“建築師”，“護士”，“外科醫生”，“祖母”，“爸爸”]

Answer 1

根據您的評論，假設您使用的是gensim的word2vec：

加載或訓練模型進行嵌入，然后在模型上調用：

min_distance = float('inf')
min_pair = None
word2vec_model_wv = model.wv  # Unsure if this can be done in the loop, but just to be safe efficiency-wise
for candidate_word1 in words:
    for candidate_word2 in words:
        if candidate_word1 == candidate_word2:
            continue  # ignore when the two words are the same

        distance = word2vec_model_wv.distance(candidate_word1, candidate_word2)
        if distance < min_distance:
            min_pair = (candidate_word1, candidate_word2)
            min_distance = distance

https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.distance

也可能是相似的（我不完全確定是否存在差異）。 https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.similarity

如我期望的那樣，如果相似性隨着使用更接近的單詞而變得更大，那么您將希望最大化而不是最小化，而只是將距離函數調用替換為相似性調用。 基本上，這只是對上的簡單最小/最大函數。

Answer 2

@ rylan-feldspar的答案通常是正確的方法，並且可以使用，但是您可以使用標准Python庫/慣用語（尤其是itertools ，列表理解和排序功能）來更緊湊地完成此操作。

例如，首先使用itertools combinations()來生成所有候選單詞對：

from itertools import combinations
candidate_words = ['architect', 'nurse', 'surgeon', 'grandmother', 'dad']
all_pairs = combinations(candidate_words, 2)

然后，使用成對相似性裝飾對：

scored_pairs = [(w2v_model.wv.similarity(p[0], p[1]), p)
                for p in all_pairs]

最后，排序以將最相似的對放在首位，並報告得分和對：

sorted_pairs = sorted(scored_pairs, reverse=True)
print(sorted_pairs[0])  # first item is most-similar pair

如果您想緊湊但不易讀，則可以使用（長）“ 1-liner”：

print(sorted([(w2v_model.wv.similarity(p[0], p[1]), p) 
              for p in combinations(candidate_words, 2)
             ], reverse=True)[0])

更新：

整合@ ryan-feldspar關於max()的建議，並力求最小化，這也應該用於報告最佳配對（而不是其得分）：

print(max(combinations(candidate_words, 2),
          key=lambda p:w2v_model.wv.similarity(p[0], p[1])))

使用Word2vec來確定一組詞中最相似的兩個詞

問題描述

2 個解決方案

解決方案1
2 2019-03-15 18:17:47

解決方案2
1 已采納 2019-03-15 19:19:37

使用Word2vec來確定一組詞中最相似的兩個詞

問題描述

2 個解決方案

解決方案1 2 2019-03-15 18:17:47

解決方案2 1 已采納 2019-03-15 19:19:37

解決方案1
2 2019-03-15 18:17:47

解決方案2
1 已采納 2019-03-15 19:19:37