简体   繁体   中英

Cluster a Distance Matrix in Python

I have a distance matrix of the form:

        str1    str2    str3    str4    ...     strn
str1    0.8     0.4     0.6     0.1     ...     0.2
str2    0.4     0.7     0.5     0.1     ...     0.1
str3    0.6     0.5     0.6     0.1     ...     0.1
str4    0.1     0.1     0.1     0.5     ...     0.6
.       .       .       .       .       ...     .
.       .       .       .       .       ...     .
.       .       .       .       .       ...     .
strn    0.2     0.1     0.1     0.6     ...     0.7

Each element contains a distance between two strings, string i and string j that has been calculated based on their similarity. If the strings are similar the value is higher. As it can be seen from the matrix, the same string would not get a 1 or 0. However, the value is high.

My requirement is to cluster the strings based on their values so that most similar strings are clustered together. For example, the five strings here should be clusterd: [str1, str2, str3], [str4, strn].

I am looking for a python library to do this.

Since you already have similarity values, try hierachical clustering . For example, scipy lib provides several methods for it.

*don't forget to convert your similarity matrix to distance one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM