[英]How to find abstractness of a word using hyper-/hyponyms in wordnet?
I have 2 words, let's say computer
and tool
.我有 2 个词,比如说computer
和tool
。 Computer
is a concrete noun whereas tool
is relatively abstract. Computer
是一个具体名词,而tool
是相对抽象的。 I want to get level of abstractness of each word that will reflect this.我想获得反映这一点的每个单词的抽象程度。 I thought the best way to do it is by counting number of hyper/hypo nyms for each word.我认为最好的方法是计算每个单词的超/下位词的数量。
Thanks!谢谢!
computer
would you refer to?第一个问题是您指的是computer
的哪个含义?In WordNet, a word has different "concepts", aka synsets:在 WordNet 中,一个词有不同的“概念”,也就是同义词集:
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets('computer')
[Synset('computer.n.01'), Synset('calculator.n.01')]
>>> wn.synsets('computer')[0].definition()
'a machine for performing calculations automatically'
>>> wn.synsets('computer')[1].definition()
'an expert at calculation (or at operating calculating machines)'
computer
并且超/下位词没有连接到computer
这个词The hyper/hyponyms are concepts, ie synsets too, so it's not connected to the form/word but to the possible synsets that might be represented by the word computer
, ie超/下位词也是概念,即同义词,因此它不连接到形式/单词,而是连接到可能由单词computer
表示的同义词,即
>>> type(wn.synsets('computer')[0])
<class 'nltk.corpus.reader.wordnet.Synset'>
>>> wn.synsets('computer')[0].hypernyms()
[Synset('machine.n.01')]
>>> wn.synsets('computer')[0].hyponyms()
[Synset('analog_computer.n.01'), Synset('digital_computer.n.01'), Synset('home_computer.n.01'), Synset('node.n.08'), Synset('number_cruncher.n.02'), Synset('pari-mutuel_machine.n.01'), Synset('predictor.n.03'), Synset('server.n.03'), Synset('turing_machine.n.01'), Synset('web_site.n.01')]
According to the definition, should words have hyper/hyponyms?根据定义,单词应该有超/下义词吗? Or should concept have hypo/hypernyms?或者概念应该有下位/上位词?
Okay, then we have to make some assumption.好的,那么我们必须做一些假设。
Lets consider all synsets of a word accessed through the WordNet as a "holistic" concept of any word form让我们将通过 WordNet 访问的单词的所有同义词视为任何单词形式的“整体”概念
We consider the sum of all DIRECT hyper-/hyponyms of all synsets of a given word我们考虑给定单词的所有同义词的所有直接超/下义词的总和
Based on the number of hyper-/hyponyms of all synsets that can be represented by a certain word form, we deduce that word X
is more/less abstract than word Y
根据可以用某种词形表示的所有同义词的超/下义词的数量,我们推断word X
比word Y
更抽象/更不抽象
>>> hypernym_count = lambda word: sum(len(ss.hypernyms()) for ss in wn.synsets(word))
>>> hyponym_count = lambda word: sum(len(ss.hyponyms()) for ss in wn.synsets(word))
>>> hyponym_count('computer')
14
>>> hypernym_count('computer')
2
>>> hypernym_count('tool')
8
>>> hyponym_count('tool')
32
Since (3) is your hypothesis that you want to test, you should be the one deciding what heuristics to deduce if a word is more/less abstract based on the hyponym_count
and hypernym_count
results由于 (3) 是您要测试的假设,因此您应该根据下位词hyponym_count
和hypernym_count
计数结果决定如果一个词或多/少抽象,则要推断出什么启发式方法
DIRECT
hyper-/hyponyms?等一下,什么是DIRECT
超/下义词?We're only accessing the hyper-/hyponyms one level above/below the synset.我们只访问同义词之上/之下一级的超/下义词。 That's what "direct" means here.这就是“直接”在这里的意思。
Then how to get all the hyponyms below a synset, see https://stackoverflow.com/a/42012001/610569那么如何获取同义词集下的所有下位词,请参见https://stackoverflow.com/a/42012001/610569
That's for you to find out and tell us =) Have fun!那是你找出来告诉我们=)玩得开心!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.