简体   繁体   中英

kmeans clustering with dataframe (scipy)

I would like to run kmeans clustering with more than 3 features. I've tried with two features and wondering how to provide more than 3 features to sklearn.cluster KMeans.

Here's my code and dataframe that I'd like to select features to run. I have multiple dataframes as an input and I have to provide them as features.

# currently two features are selected
# I'd like to combine more than 3 features and provide them to dataset
df_features = pd.merge(df_max[['id', 'max']], 
df_var[['id', 'variance']], on='id', how='left')

cols = list(df_features.loc[:,'max':'variance'])
X = df_features.as_matrix(columns=cols)

kmeans = KMeans(n_clusters=3)
kmeans.fit(X)

centroid = kmeans.cluster_centers_
labels = kmeans.labels_

colors = ["g.","r.","c."]

for i in range(len(X)):
   print ("coordinate:" , X[i], "label:", labels[i])
   plt.plot(X[i][0],X[i][1],colors[labels[i]],markersize=10)

plt.scatter(centroid[:,0],centroid[:,1], marker = "x", s=150, linewidths = 5, zorder =10)

plt.show()
  1. Generally you wouldn't want id to be a feature, because, unless you have good reason to believe otherwise, they do not correlate with anything.

  2. As long as you feed in a valid matrix X at kmeans.fit(X) , it will run KMean algorithm for you regardless of number of features in X . Though, if you have a huge amount of features, it may take longer to finish.

  3. The problem is then how to construct X . As you have shown in your example, you can simply merge dataframes, select the wanted columns, and extract feature matrix with a .as_matrix() call. If you have more dataframes and columns, I guess you just merge more and select more.

  4. Feature selection and dimensional reduction may come in handy once you have more than enough features in your dataset. Read more about them when you have time.

PS Why scipy in the title?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM