简体   繁体   English

在Python中正确使用NTLK

[英]Using NTLK properly with python

I am trying to use python with NTLK to get a number of acronyms for a number of words (for now 2). 我正在尝试将python与NTLK结合使用以获取许多单词的缩写词(目前为2)。 It seems that I can get it to work with the first word, but not the second. 看来我可以使用第一个单词,但不能使用第二个单词。 I'm guessing I still have lots to learn about NTLK. 我猜想我还有很多关于NTLK的知识。 There is some simplified example code below. 下面有一些简化的示例代码。 I am basically trying to get two lists of acronyms, 1 list for each word. 我基本上是想得到两个缩写词列表,每个单词一个列表。 All was well with the first for loop. 第一个for循环一切都很好。 After I tried the second word I get: 在尝试第二个单词后,我得到:

syn2 = wn.synsets(word)[0].lemmas[y]
IndexError: list index out of range

Hope someone can aid me in my understanding of why this is happening. 希望有人能帮助我理解为什么会这样。

import nltk
from nltk.corpus import wordnet as wn
import string
from array import * 

syn1 = '' 
syn2 = '' 
mylist = []    
mylist2 = []    
mylist3 = []  


Web_Keywd = 'car loan'
wuser_words = Web_Keywd.split()   

for word in wuser_words:                           


     i=i+1
     #first
     if (i == 1) :
         synset1 = wn.synsets(word)    
         y = 0     
         for synset in synset1:
             syn1 = wn.synsets(word)[0].lemmas[y]
             syn1 = syn1.name
             mylist2.append(syn1)
             y=y+1
     if (i == 2) :
         y = 0     
         for synset2 in wn.synsets(word):
             syn2 = wn.synsets(word)[0].lemmas[y]
             syn2 = syn2.name
             mylist3.append(syn2)
             y=y+1  

I've perhaps misleaded you in my previous answer with the use of wn.synsets(word)[0].lemmas[y] . 在使用wn.synsets(word)[0].lemmas[y] ,我可能在先前的回答中误导了您。 You need to explicitly loop over the lemmas, as you can't know how many there are in advance. 您需要明确地绕过引理,因为您不知道事先有多少个引理。 Example use case: 用例示例:

Web_Keywd = 'car loan cheap'

results = {}
for word in Web_Keywd.split():
    for synset in wn.synsets(word):
        for lemma in synset.lemmas:
            results.setdefault(word, []).append(lemma.name)

results now looks as follows: 现在的results如下所示:

{'car': ['car', 'auto', 'automobile', 'machine'...],
'loan': ['loan', 'loanword', 'loan', 'lend', 'loan'...],
'cheap': ['cheap', 'inexpensive', 'brassy', 'cheap...]}

To get unique results for each word submitted, independently of the others : 要获得提交的每个单词的独特结果,而与其他单词无关

.... # same as above
            results.setdefault(word, set()).add(lemma.name)

To get a list of unique words for all the words submitted: 要获取所有提交的单词的唯一单词列表:

Web_Keywd = 'car loan cheap'

words = set(Web_Keywd.split())
results = set(
    lemma.name
    for word in words
        for synset in wn.synsets(word)
            for lemma in synset.lemmas
)
# results -> {'loanword', 'tatty', 'automobile', 'cheap', 'chinchy',...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM