简体   繁体   English

欧氏距离矩阵

[英]euclidean distance matrix

I'd like to calculate Euclidean distance between two words. 我想计算两个词之间的欧几里得距离。 First of all, each phoneme was vectorised: 首先,将每个音素矢量化:

g = (0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0)
a = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0)
k = (0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0)
n = (0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0)
N = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

So distance between two words, 'gaN' and 'gak', for example, is 因此,例如两个单词“ gaN”和“ gak”之间的距离是

dst1 = distance.euclidean(g,g)
dst2 = distance.euclidean(a,a)
dst3 = distance.euclidean(N,k)
dist = dst1+dst2+dst3
print(dist)

What I'd like to make is a huge matrix that shows all distances between over 800 words. 我想做的是一个巨大的矩阵,其中显示了800多个单词之间的所有距离。 That should look like the table below (as in csv file) 看起来应该如下表所示(如csv文件中所示)

    gaN   gak   gan  gal ...
gaN 0     1.73  1.41
gak 1.73  0     2.24
gan 1.41  2.24  0
gal
...

Could anyone help me with this? 有人可以帮我吗? I'm currently using Python but R would be fine, too. 我目前正在使用Python,但R也可以。

Euclidean distance can only operate on numeric objects, as you know. 众所周知,欧氏距离只能对数字对象进行运算。 I'm not sure what a phenome is, but if you already have numeric representations of all words, then it should be trivial. 我不确定什么是现象,但是如果您已经拥有所有单词的数字表示形式,那么它应该是微不足道的。 (in this case, is your problem translating the distance matrix back to the GaN, gak table? if so, more information is needed about how you get from there to the phenome object(s). (在这种情况下,您是将距离矩阵转换回GaN,gak表的问题吗?如果是这样,则需要有关如何从那里到现象对象的更多信息。

As far as converting to csv, that's trivial. 至于转换为csv,这是微不足道的。 You can actually do it with zero additional lines using the excellent pandas package: 实际上,您可以使用出色的pandas软件包以零额外的行数来完成此操作:

import pandas as pd dist=pd.DataFrame(euclidean_distances(tbl1,tbl2)).to_csv('distances.csv') 将熊猫作为pd dist = pd.DataFrame(euclidean_distances(tbl1,tbl2))。to_csv('distances.csv')导入

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM