簡體   English   中英

WordNet Python的單詞相似度

[英]WordNet Python words similarity

我試圖找到一種可靠的方法來衡量2個術語的語義相似性。 第一個度量可以是下義/上位圖上的路徑距離(最終,2-3個度量的線性組合可能更好......)。

from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')
print(dog.path_similarity(cat))
  • 我仍然沒有得到n.01含義以及為什么它是必要的。
  • 有一種方法可以直觀地顯示2個術語之間的計算路徑?
  • 我可以使用哪種其他nltk語義指標?

我仍然沒有得到n.01的含義以及為什么有必要。

這里nltk來源顯示結果是"WORD.PART-OF-SPEECH.SENSE-NUMBER"

引用來源:

Create a Lemma from a "<word>.<pos>.<number>.<lemma>" string where:
<word> is the morphological stem identifying the synset
<pos> is one of the module attributes ADJ, ADJ_SAT, ADV, NOUN or VERB
<number> is the sense number, counting from 0.
<lemma> is the morphological form of interest

n意味着名詞,我也建議閱讀wordnet數據集

2.有一種方法可以直觀地顯示2個術語之間的計算路徑嗎?

請查看關於相似性部分的nltk wordnet文檔 你有幾種路徑算法選擇(你可以嘗試混合幾種)。

來自nltk docs的幾個例子:

from nltk.corpus import wordnet as wn
dog = wn.synset('dog.n.01')
cat = wn.synset('cat.n.01')

print(dog.path_similarity(cat))
print(dog.lch_similarity(cat))
print(dog.wup_similarity(cat))

對於可視化,您可以構建距離矩陣M[i,j] ,其中:

M[i,j] = word_similarity(i, j)

並使用以下stackoverflow答案繪制可視化。

3.我可以使用哪種其他nltk語義指標?

如上所述,有幾種方法可以計算單詞的相似性。 我也建議調查gensim 我使用word2vec實現單詞相似性,它對我很有用。

如果您需要任何幫助選擇算法,請提供有關您所面臨問題的更多信息。

更新:

有關word sense number含義的更多信息,請訪問

WordNet中的感覺通常從大多數到最不常用的順序排序, 最常見的編號為1 ......

問題是“狗”是模糊的,你必須為它選擇正確的含義。

您可以選擇第一種感覺作為天真的方法,或者根據您的應用或研究找到您自己的算法來選擇正確的含義。

擺脫共發現你可以簡單地調用Word的所有可用的定義(WordNet的文檔稱為同義詞集wn.synsets(word)

我鼓勵您深入研究每個定義中這些synset中包含的元數據。

下面的代碼顯示了一個簡單的示例來獲取此元數據並很好地打印它。

from nltk.corpus import wordnet as wn

dog_synsets = wn.synsets('dog')

for i, syn in enumerate(dog_synsets):
    print('%d. %s' % (i, syn.name()))
    print('alternative names (lemmas): "%s"' % '", "'.join(syn.lemma_names()))
    print('definition: "%s"' % syn.definition())
    if syn.examples():
        print('example usage: "%s"' % '", "'.join(syn.examples()))
    print('\n')

代碼輸出:

0. dog.n.01
alternative names (lemmas): "dog", "domestic_dog", "Canis_familiaris"
definition: "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds"
example usage: "the dog barked all night"


1. frump.n.01
alternative names (lemmas): "frump", "dog"
definition: "a dull unattractive unpleasant girl or woman"
example usage: "she got a reputation as a frump", "she's a real dog"


2. dog.n.03
alternative names (lemmas): "dog"
definition: "informal term for a man"
example usage: "you lucky dog"


3. cad.n.01
alternative names (lemmas): "cad", "bounder", "blackguard", "dog", "hound", "heel"
definition: "someone who is morally reprehensible"
example usage: "you dirty dog"


4. frank.n.02
alternative names (lemmas): "frank", "frankfurter", "hotdog", "hot_dog", "dog", "wiener", "wienerwurst", "weenie"
definition: "a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll"


5. pawl.n.01
alternative names (lemmas): "pawl", "detent", "click", "dog"
definition: "a hinged catch that fits into a notch of a ratchet to move a wheel forward or prevent it from moving backward"


6. andiron.n.01
alternative names (lemmas): "andiron", "firedog", "dog", "dog-iron"
definition: "metal supports for logs in a fireplace"
example usage: "the andirons were too hot to touch"


7. chase.v.01
alternative names (lemmas): "chase", "chase_after", "trail", "tail", "tag", "give_chase", "dog", "go_after", "track"
definition: "go after with the intent to catch"
example usage: "The policeman chased the mugger down the alley", "the dog chased the rabbit"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM