简体   繁体   中英

euclidean distance matrix

I'd like to calculate Euclidean distance between two words. First of all, each phoneme was vectorised:

g = (0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0)
a = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0)
k = (0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0)
n = (0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0)
N = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

So distance between two words, 'gaN' and 'gak', for example, is

dst1 = distance.euclidean(g,g)
dst2 = distance.euclidean(a,a)
dst3 = distance.euclidean(N,k)
dist = dst1+dst2+dst3
print(dist)

What I'd like to make is a huge matrix that shows all distances between over 800 words. That should look like the table below (as in csv file)

    gaN   gak   gan  gal ...
gaN 0     1.73  1.41
gak 1.73  0     2.24
gan 1.41  2.24  0
gal
...

Could anyone help me with this? I'm currently using Python but R would be fine, too.

Euclidean distance can only operate on numeric objects, as you know. I'm not sure what a phenome is, but if you already have numeric representations of all words, then it should be trivial. (in this case, is your problem translating the distance matrix back to the GaN, gak table? if so, more information is needed about how you get from there to the phenome object(s).

As far as converting to csv, that's trivial. You can actually do it with zero additional lines using the excellent pandas package:

import pandas as pd dist=pd.DataFrame(euclidean_distances(tbl1,tbl2)).to_csv('distances.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM