简体   繁体   中英

Scikit-learn KMeans clustering - fit cluster with X features, predict cluster membership with X-1 features?

I am currently trying to solve some kind of a regression task (predict a value of 'count' field) using a KMeans clustering. The idea is trivial:

Fit a cluster on my test dataset:

 k_means = cluster.KMeans(n_clusters=4, n_init = 20, init='random')
 k_means.fit(df[['DistanceToMidnight','season','DayType','weather','temp','atemp','humidity','windspeed','count']])

*notice that I do use 'count' in clustering.

Then I want to use my test set (which is much the same, except it hasn't 'count' field) - I want to determine cluster membership using all features EXCEPT 'count' and then assign 'count' to each row in test set to the 'count'-related coordinate of assigned cluster-center.

Any ideas how to simply do this using standard functions of KMeans cluster? I can't just call 'k_means.predict' since it will fail due to features number mismatch.

The simplest way I could think of is to construct a k_means clustering object using provided cluster centers from already trained clustering. But I am not sure how to do this. Is it possible to create new cluster.KMeans object by providing it with already defined cluster centroids?

  1. Find the nearest cluster center
  2. Use the missing value from the center

If you stick to the k-means principle, your best prediction value is the value that was assigned to the center; unless you eg build a regression model for each cluster independently.

You can first calculate all the centroids using K-Means. Then compute euclidean distance from sklearn.metrics from every point to all the centroids (except those you want to exclude). Finally, get the cluster that minimizes the distance ( np.argmin along 2nd axis) for each point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM