[英]Is wordnet path similarity commutative?
我正在使用nltk的wordnet API。 當我將一個synset與另一個synset進行比較時,我得到了None
但是當我比較它們時,我得到一個浮點值。
他們不應該給出相同的價值嗎? 有解釋還是這是wordnet的錯誤?
例:
wn.synset('car.n.01').path_similarity(wn.synset('automobile.v.01')) # None
wn.synset('automobile.v.01').path_similarity(wn.synset('car.n.01')) # 0.06666666666666667
技術上沒有虛擬根, car
和automobile
同義詞都沒有相互聯系:
>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
>>> print x.shortest_path_distance(y)
None
>>> print y.shortest_path_distance(x)
None
現在,讓我們仔細看看虛擬根問題。 首先,NLTK中有一個簡潔的函數,表明synset是否需要虛擬根:
>>> x._needs_root()
False
>>> y._needs_root()
True
接下來,當您查看path_similarity
代碼( http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity )時,您可以看到:
def path_similarity(self, other, verbose=False, simulate_root=True):
distance = self.shortest_path_distance(other, \
simulate_root=simulate_root and self._needs_root())
if distance is None or distance < 0:
return None
return 1.0 / (distance + 1)
因此對於automobile
synset,當您嘗試y.path_similarity(x)
時,此參數simulate_root=simulate_root and self._needs_root()
將始終為True
,當您嘗試x.path_similarity(y)
,它將始終為False
因為x._needs_root()
是False
:
>>> True and y._needs_root()
True
>>> True and x._needs_root()
False
現在,當path_similarity()
傳承給shortest_path_distance()
https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance ),然后hypernym_distances()
,它會嘗試調用上位詞列表來檢查它們的距離,如果沒有simulate_root = True
, automobile
synset將不會連接到car
,反之亦然:
>>> y.hypernym_distances(simulate_root=True)
set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])
>>> y.hypernym_distances()
set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])
>>> x.hypernym_distances()
set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])
所以理論上,正確的path_similarity
是0 / None,但由於simulate_root=simulate_root and self._needs_root()
參數,
NLTK API中的nltk.corpus.wordnet.path_similarity()
不可交換。
但是代碼也沒有錯誤/錯誤,因為通過遍歷根的任何synset距離的比較將是恆定的,因為虛擬*ROOT*
的位置將永遠不會改變,因此最好的做法是這樣做來計算path_similarity :
>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
# When you NEVER want a non-zero value, since going to
# the *ROOT* will always get you some sort of distance
# from synset x to synset y
>>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))
# when you can allow None in synset similarity comparison
>>> min(wn.path_similarity(x,y), wn.path_similarity(y,x))
我不認為這是wordnet本身的錯誤。 在您的情況下,汽車被指定為動詞和汽車作為名詞,因此您需要查看同義詞集以查看圖表的外觀並確定網絡是否正確標記。
A = 'car.n.01'
B = 'automobile.v.01'
C = 'automobile.n.01'
wn.synset(A).path_similarity(wn.synset(B))
wn.synset(B).path_similarity(wn.synset(A))
wn.synset(A).path_similarity(wn.synset(C)) # is 1
wn.synset(C).path_similarity(wn.synset(A)) # is also 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.