简体   繁体   English

迭代一个同义词集列表到另一个

[英]Iterate one list of synsets over another

I have two sets of wordnet synsets (contained in two separate list objects, s1 and s2), from which I want to find the maximum path similarity score for each synset in s1 onto s2 with the length of output equal that of s1. 我有两套wordnet同义词集(包含在两个单独的列表对象s1和s2中),我想从中找到s1到s2上每个同义词集的最大路径相似性得分,其输出长度等于s1。 For example, if s1 contains 4 synsets, then the length of output should be 4. 例如,如果s1包含4个同义词集,则输出的长度应为4。

I have experimented with the following code (so far): 我已经尝试了以下代码(到目前为止):

 import numpy as np import nltk from nltk.corpus import wordnet as wn import pandas as pd #two wordnet synsets (s1, s2) s1 = [wn.synset('be.v.01'), wn.synset('angstrom.n.01'), wn.synset('trial.n.02'), wn.synset('function.n.01')] s2 = [wn.synset('use.n.01'), wn.synset('function.n.01'), wn.synset('check.n.01'), wn.synset('code.n.01'), wn.synset('inch.n.01'), wn.synset('be.v.01'), wn.synset('correct.v.01')] # define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1 ps_list = [] def similarity_score(s1, s2): for word1 in s1: best = max(wn.path_similarity(word1, word2) for word2 in s2) ps_list.append(best) return ps_list ps_list(s1, s2) 

But it returns this following error message 但它返回以下错误消息

'>' not supported between instances of 'NoneType' and 'float'

I couldn't figure out what's going on with code. 我不知道代码是怎么回事。 Would anyone care to take a look at my code and share his/her insights on the for loop? 有人愿意看一下我的代码并在for循环上分享他/她的见解吗? It will be really appreciated. 我们将不胜感激。

Thank you. 谢谢。

The full error traceback is here 完整的错误回溯在这里

 TypeError Traceback (most recent call last) <ipython-input-73-4506121e17dc> in <module>() 38 return word_list 39 ---> 40 s = similarity_score(s1, s2) 41 42 <ipython-input-73-4506121e17dc> in similarity_score(s1, s2) 33 def similarity_score(s1, s2): 34 for word1 in s1: ---> 35 best = max(wn.path_similarity(word1, word2) for word2 in s2) 36 word_list.append(best) 37 TypeError: '>' not supported between instances of 'NoneType' and 'float' 

[edit] I came up with this temporary solution: [编辑]我想出了这个临时解决方案:

 s_list = [] for word1 in s1: best = [word1.path_similarity(word2) for word2 in s2] b = pd.Series(best).max() s_list.append(b) 

It's not elegant but it works. 它不优雅,但可以。 Wonder if anyone have better solutions or handy tricks to handle this? 想知道是否有人有更好的解决方案或方便的技巧来解决此问题?

I have no experience with the nltk module, but from reading the docs I can see that path_similarity is a method of whatever object wn.synset(args) returns. 我没有使用nltk模块的经验,但是通过阅读文档,我可以看到path_similarity是任何对象wn.synset(args)返回的方法。 You are instead treating it as a function. 相反,您将其视为函数。

What you should be doing, is something like this: 您应该做的是这样的:

ps_list = []
for word1 in s1:
    best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
    ps_list.append(best)

I think the error comes from the following: 我认为错误来自以下方面:

best = max(wn.path_similarity(word1, word2) for word2 in s2)

you should add a condition if wn.path_similarity(word1, word2) is NoneType, then you cannot do max() , for instance you can re-write like this: 如果wn.path_similarity(word1,word2)为NoneType,则应添加一个条件,则不能执行max() ,例如,可以像这样重写:

best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM