简体   繁体   English

将kmeans model保存到以后相同的数据聚类

[英]Save kmeans model to future same data clustering

I am currently working on clustering a data set.我目前正在对数据集进行聚类。 My question is, is there any way to save the result of the groups so that in the future I can work with new data and know to which group they belong according to the kmeans "model" I made?我的问题是,有没有办法保存组的结果,以便将来我可以使用新数据并根据我制作的 kmeans“模型”知道它们属于哪个组?

I have learned to work with Kmeans, it is very interesting, but when I want to know what a new data belongs to, right now I repeat the whole process of analysis.我已经学会了使用 Kmeans,这很有趣,但是当我想知道一个新数据属于什么时,我现在重复整个分析过程。 And what I would like is according to the old data (we could call it training data) can I define the group of a new data?而我想要的是根据旧数据(我们可以称之为训练数据)我可以定义一组新数据吗?

This is my code right now.这是我现在的代码。

n_clusters = 15
kmeans = KMeans(n_clusters = n_clusters, init = 'k-means++', max_iter = 3000, n_init = 100, random_state = 0)
y_kmeans = kmeans.fit_predict(data)

data_df['k-means'] = y_kmeans

If I plot my current results, I already have the entire data spectrum occupied.如果我 plot 我当前的结果,我已经拥有了整个数据频谱。 Therefore, any new data must belong to one of the current groups.因此,任何新数据都必须属于当前组之一。

#Visualising the clusters
colors = ['blue', 'orange', 'green', 'red', 'yellow', 'cyan', 'brown', 'cadetblue', 'gray',\
          'salmon', 'olive', 'deeppink', 'pink', 'gold', 'lime']
for i in range(n_clusters):
    plt.scatter(data[y_kmeans == i, 0], data[y_kmeans == i, 1], color=colors[i])

#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], label = 'Centroids')

plt.legend()

在此处输入图像描述

Obviously with new data, you will also re-study the data for variations.显然,对于新数据,您还将重新研究数据的变化。

Thank you very much.非常感谢你。

You can simply keep the cluster centers and assign each new data point to the nearest cluster (ie., minimize the Euclidean distance).您可以简单地保留聚类中心并将每个新数据点分配给最近的聚类(即,最小化欧氏距离)。

This is what the prediction step in k-means does.这就是 k-means 中的预测步骤所做的。

The cluster centers are available as y_kmeans.cluster_centers_ .聚类中心可用作y_kmeans.cluster_centers_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM