简体   繁体   English

如何在 wordnet 中使用超/下义词找到单词的抽象性?

[英]How to find abstractness of a word using hyper-/hyponyms in wordnet?

I have 2 words, let's say computer and tool .我有 2 个词,比如说computertool Computer is a concrete noun whereas tool is relatively abstract. Computer是一个具体名词,而tool是相对抽象的。 I want to get level of abstractness of each word that will reflect this.我想获得反映这一点的每个单词的抽象程度。 I thought the best way to do it is by counting number of hyper/hypo nyms for each word.我认为最好的方法是计算每个单词的超/下位词的数量。

  1. Is it possible?可能吗?
  2. Is there a better way to do it?有更好的方法吗?

Thanks!谢谢!

The first problem is which meaning of computer would you refer to?第一个问题是您指的是computer的哪个含义?

In WordNet, a word has different "concepts", aka synsets:在 WordNet 中,一个词有不同的“概念”,也就是同义词集:

>>> from nltk.corpus import wordnet as wn

>>> wn.synsets('computer')
[Synset('computer.n.01'), Synset('calculator.n.01')]

>>> wn.synsets('computer')[0].definition()
'a machine for performing calculations automatically'
>>> wn.synsets('computer')[1].definition()
'an expert at calculation (or at operating calculating machines)'

And hyper/hyponyms are not connected to the word computer并且超/下位词没有连接到computer这个词

The hyper/hyponyms are concepts, ie synsets too, so it's not connected to the form/word but to the possible synsets that might be represented by the word computer , ie超/下位词也是概念,即同义词,因此它不连接到形式/单词,而是连接到可能由单词computer表示的同义词,即

>>> type(wn.synsets('computer')[0])
<class 'nltk.corpus.reader.wordnet.Synset'>

>>> wn.synsets('computer')[0].hypernyms()
[Synset('machine.n.01')]

>>> wn.synsets('computer')[0].hyponyms()
[Synset('analog_computer.n.01'), Synset('digital_computer.n.01'), Synset('home_computer.n.01'), Synset('node.n.08'), Synset('number_cruncher.n.02'), Synset('pari-mutuel_machine.n.01'), Synset('predictor.n.03'), Synset('server.n.03'), Synset('turing_machine.n.01'), Synset('web_site.n.01')]

Yes that's a lot of information but how do I get hyper/hyponyms for words?是的,这是很多信息,但是我如何获得单词的超/下义词?

According to the definition, should words have hyper/hyponyms?根据定义,单词应该有超/下义词吗? Or should concept have hypo/hypernyms?或者概念应该有下位/上位词?

Fine, you're bringing me in circles... Just tell me how to use hyper-/hyponyms to see if a word is more abstract than another word!!!好吧,你让我陷入困境......告诉我如何使用超/下位词来查看一个词是否比另一个词更抽象!

Okay, then we have to make some assumption.好的,那么我们必须做一些假设。

  1. Lets consider all synsets of a word accessed through the WordNet as a "holistic" concept of any word form让我们将通过 WordNet 访问的单词的所有同义词视为任何单词形式的“整体”概念

  2. We consider the sum of all DIRECT hyper-/hyponyms of all synsets of a given word我们考虑给定单词的所有同义词的所有直接超/下义词的总和

  3. Based on the number of hyper-/hyponyms of all synsets that can be represented by a certain word form, we deduce that word X is more/less abstract than word Y根据可以用某种词形表示的所有同义词的超/下义词的数量,我们推断word Xword Y更抽象/更不抽象

But how to do (1), (2) and (3) in the code?但是代码中的(1)、(2)、(3)怎么办呢?

>>> hypernym_count = lambda word: sum(len(ss.hypernyms()) for ss in wn.synsets(word)) 
>>> hyponym_count = lambda word: sum(len(ss.hyponyms()) for ss in wn.synsets(word)) 

>>> hyponym_count('computer')
14
>>> hypernym_count('computer')
2


>>> hypernym_count('tool')
8
>>> hyponym_count('tool')
32

Since (3) is your hypothesis that you want to test, you should be the one deciding what heuristics to deduce if a word is more/less abstract based on the hyponym_count and hypernym_count results由于 (3) 是您要测试的假设,因此您应该根据下位词hyponym_counthypernym_count计数结果决定如果一个词或多/少抽象,则要推断出什么启发式方法

Wait a minute, what's DIRECT hyper-/hyponyms?等一下,什么是DIRECT超/下义词?

We're only accessing the hyper-/hyponyms one level above/below the synset.我们只访问同义词之上/之下一级的超/下义词。 That's what "direct" means here.这就是“直接”在这里的意思。

Then how to get all the hyponyms below a synset, see https://stackoverflow.com/a/42012001/610569那么如何获取同义词集下的所有下位词,请参见https://stackoverflow.com/a/42012001/610569

So should I use direct or all hyponyms below or all all hypernyms above?那么我应该使用下面的直接或所有下位词还是上面的所有上位词?

That's for you to find out and tell us =) Have fun!那是你找出来告诉我们=)玩得开心!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM