euclidean distance matrix

Question

I'd like to calculate Euclidean distance between two words. First of all, each phoneme was vectorised:

g = (0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0)
a = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0)
k = (0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0)
n = (0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0)
N = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

So distance between two words, 'gaN' and 'gak', for example, is

dst1 = distance.euclidean(g,g)
dst2 = distance.euclidean(a,a)
dst3 = distance.euclidean(N,k)
dist = dst1+dst2+dst3
print(dist)

What I'd like to make is a huge matrix that shows all distances between over 800 words. That should look like the table below (as in csv file)

    gaN   gak   gan  gal ...
gaN 0     1.73  1.41
gak 1.73  0     2.24
gan 1.41  2.24  0
gal
...

Could anyone help me with this? I'm currently using Python but R would be fine, too.

Answer 1

Euclidean distance can only operate on numeric objects, as you know. I'm not sure what a phenome is, but if you already have numeric representations of all words, then it should be trivial. (in this case, is your problem translating the distance matrix back to the GaN, gak table? if so, more information is needed about how you get from there to the phenome object(s).

As far as converting to csv, that's trivial. You can actually do it with zero additional lines using the excellent pandas package:

import pandas as pd dist=pd.DataFrame(euclidean_distances(tbl1,tbl2)).to_csv('distances.csv')

euclidean distance matrix

Question

1 answers

solution1
0 2018-01-11 15:15:57

euclidean distance matrix

Question

1 answers

solution1 0 2018-01-11 15:15:57

solution1
0 2018-01-11 15:15:57