Cluster a Distance Matrix in Python

Question

I have a distance matrix of the form:

        str1    str2    str3    str4    ...     strn
str1    0.8     0.4     0.6     0.1     ...     0.2
str2    0.4     0.7     0.5     0.1     ...     0.1
str3    0.6     0.5     0.6     0.1     ...     0.1
str4    0.1     0.1     0.1     0.5     ...     0.6
.       .       .       .       .       ...     .
.       .       .       .       .       ...     .
.       .       .       .       .       ...     .
strn    0.2     0.1     0.1     0.6     ...     0.7

Each element contains a distance between two strings, string i and string j that has been calculated based on their similarity. If the strings are similar the value is higher. As it can be seen from the matrix, the same string would not get a 1 or 0. However, the value is high.

My requirement is to cluster the strings based on their values so that most similar strings are clustered together. For example, the five strings here should be clusterd: [str1, str2, str3], [str4, strn].

I am looking for a python library to do this.

Answer 1

Since you already have similarity values, try hierachical clustering . For example, scipy lib provides several methods for it.

*don't forget to convert your similarity matrix to distance one.

Cluster a Distance Matrix in Python

Question

1 answers

solution1
0 2015-02-21 15:29:21

Cluster a Distance Matrix in Python

Question

1 answers

solution1 0 2015-02-21 15:29:21

solution1
0 2015-02-21 15:29:21