简体   繁体   中英

Need to compare the K-means clusters similarity

I need to compare the clusters similarity but the clustering techniques produce clusters not equal on length.

Let's say I have 4 data points A, B, C and D. and assume these data set are changes over a period of time. I run KMeans clustering on this data in the first hour and get 3 clusters [(A, B),(C),(D)]. Then I run KMeans clustering on this data again in the second hour and get another 3 clusters [(B, C),(A),(D)] and so on.

I need to measure the changes of these clusters over time by compare clusters in the first hour with the second one and assign a score of similarity.

For Example:

The third cluster in the first hour is more similar to the third cluster in the second hour with 100% and there is no problem here, but the problem is how I measure the others.

1- (A, B) started together then they dispersed, If said (A, B) is like (B, C) with 50%.

2- I will not able to assign a score between (A, B) with (A) and (C) with (A, B) because they not the same length and If follow the methodology of counting them I will get multiple similar scores.

If someone has an idea to solve this problem.

check this idea see if it works: 1- run k-mean clustering and save the centroids in any period of time you want 2- by measuring the movement of centroids you could compare every hour

hope it would help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM