简体   繁体   中英

What is Cluster, dissimilarity and distance in python?

I am watching MIT OpenCourseWare 6.0002 clustering video and I do not understand some code from that class.

What is this .Cluster ?

for e in initialCentroids:
        clusters.append(cluster.Cluster([e]))

What is .distance ?

  for e in examples:
            smallestDistance = e.distance(clusters[0].getCentroid())

What is .dissimilarity ?

 minDissimilarity = cluster.dissimilarity(best)

From the code I can understand what they are doing, but I would like to more detail about it. Related document would be highly appreciated!

These are terms mainly to describe data and it's relationship between each other. Let's start with Cluster.

Cluster is set of observation data points which may have similar characteristics in some sense. Clustering is mainly method of unsupervised learning. To imagine easily - the map is set of clusters, grouping people by nationality, but as in ML, people may be scattered to other countries - which is normal till some grade.

if we take distance as distance between clusters , this term refers how far is cluster1's centroid from cluster2's centroid. Also term may refer to given point, by measuring distance from point to all clusters' centroids - where point would be owned by cluster with minimal distance.

In addition dissimilarity describers pretty same value as distance, it tells how datapoints are not similar to original centroid. Meaning that once distance is high - dissimilarity is also high, in my opinion - not sure about this one.

hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM