I am currently trying to solve some kind of a regression task (predict a value of 'count' field) using a KMeans clustering. The idea is trivial:
Fit a cluster on my test dataset:
k_means = cluster.KMeans(n_clusters=4, n_init = 20, init='random')
k_means.fit(df[['DistanceToMidnight','season','DayType','weather','temp','atemp','humidity','windspeed','count']])
*notice that I do use 'count' in clustering.
Then I want to use my test set (which is much the same, except it hasn't 'count' field) - I want to determine cluster membership using all features EXCEPT 'count' and then assign 'count' to each row in test set to the 'count'-related coordinate of assigned cluster-center.
Any ideas how to simply do this using standard functions of KMeans cluster? I can't just call 'k_means.predict' since it will fail due to features number mismatch.
The simplest way I could think of is to construct a k_means clustering object using provided cluster centers from already trained clustering. But I am not sure how to do this. Is it possible to create new cluster.KMeans object by providing it with already defined cluster centroids?
If you stick to the k-means principle, your best prediction value is the value that was assigned to the center; unless you eg build a regression model for each cluster independently.
You can first calculate all the centroids using K-Means. Then compute euclidean distance from sklearn.metrics
from every point to all the centroids (except those you want to exclude). Finally, get the cluster that minimizes the distance ( np.argmin
along 2nd axis) for each point.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.