[英]Confused with Clustering
I am getting so confused with clustering in data science process.我对数据科学过程中的聚类感到很困惑。 We know that the process of grouping similar points in a 2D space is based on this formula:
我们知道在二维空间中对相似点进行分组的过程是基于这个公式的:
distance = sqrt( (x2-x1)^2 + (y2-y1)^2 )
But in introducing inputs to the sklearn we just feed the x-axis values :( what happened to the y-axis values?但是在向 sklearn 引入输入时,我们只提供 x 轴值:( y 轴值发生了什么?
for example we have the following data base:例如,我们有以下数据库:
index x y
------------------
0 5 8
1 6 9
2 7 10
and we introduce x to the KMeans我们将 x 引入 KMeans
from sklearn.cluster import KMeans
kmeans = KMeans(2)
kmeans.fit(df["x"])
How can it calculate distance without having y values?它如何在没有 y 值的情况下计算距离?
KMeans clustering can be done in any number of dimensions. KMeans 聚类可以在任意数量的维度上进行。 As you said, the distance can be calculated using the Euclidean distance .
正如您所说,可以使用欧几里得距离计算距离。 This distance can be calculated for any number of dimension.
可以针对任意数量的维度计算此距离。 You passed one array, so in this case it's just one dimension, so the formula would simplify to:
您传递了一个数组,因此在这种情况下它只是一维,因此公式将简化为:
distance = sqrt((x2-x1)^2)
Which is really just the absolute value of (x2-x1)这实际上只是 (x2-x1) 的绝对值
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.