簡體   English   中英

如何使用NLTK WordNet檢查Python中的不完整單詞?

[英]How to use NLTK WordNet to check for incomplete words in Python?

我有一套話:

{下士,狗,貓,distingus,公司,電話,權威,vhicule,座位,輕量級,規則,居民,專業知識}

我想計算前一組中每個單詞之間的語義相似度。 我有一個問題:

  1. 有些單詞並不完整,因為“vhicule”。 我怎么能忽略這些話呢?

示例代碼: Python:將變量傳遞到NLTK中的Wordnet Synsets方法

import nltk.corpus as corpus
import itertools as IT
import fileinput

if __name__=="__main__":
    wordnet = corpus.wordnet
    list1 = ["apple", "honey", "drinks", "flowers", "paper"]
    list2 = ["pear", "shell", "movie", "fire", "tree"]

    for word1, word2 in IT.product(list1, list2):
        #print(word1, word2)
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        print('{w1}, {w2}: {s}'.format(
            w1 = wordFromList1.name,
            w2 = wordFromList2.name,
            s = wordFromList1.wup_similarity(wordFromList2)))

假設我將“vhicule”添加到任何列表中。 我收到以下錯誤:

IndexError:列表索引超出范圍

如何使用此錯誤忽略數據庫中不存在的單詞?

您可以檢查nltk.corpus.wordnet.synsets(i)是否返回同義詞列表:

>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> y = [i for i in x if len(wn.synsets(i)) > 0]
>>> y
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']

wn.synsets(i)方法是檢查wn.synsets(i)是否為None

>>> from nltk.corpus import wordnet as wn
>>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
>>> x
['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
>>> [i for i in x if wn.synsets(i)]
['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM