[英]Visualisation of clusters returned by Kmeans
我使用KMeans进行聚类,如下所示,但我不知道如下图所示可视化集群以查看客户的满意度。
码:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
documents = ["This little kitty came to play when I was eating at a restaurant.",
"Merley has the best squooshy kitten belly.",
"Google Translate app is incredible.",
"If you open 100 tab in google you get a smileyface.",
"Best cat photo I've ever taken.",
"Climbing ninja cat.",
"Impressed with google map feedback.",
"Key promoter extension for Google Chrome."]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)
true_k = 3
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=100,n_init=1)
model.fit(X)
让我们假设您有办法知道哪个k-means分区代表哪种情绪,您可以如下绘制饼图:
print(model.labels_) # For illustration, you can see which sentence is in which cluster
# Here we get the proportions
nb_samples = [sum(model.labels_ == j) for j in range(true_k)]
# On the next line the order is RANDOM. I do NOT know which cluster represents what.
# The first label should represent samples in cluster 0, and so on
labels = 'positive', 'neutral', 'negative'
colors = ['gold', 'red', 'lightblue'] # Same size as labels
# Pie chart
plt.pie(nb_samples, labels=labels, colors=colors, autopct='%1.1f%%')
plt.axis('equal')
plt.show()
同样没有,根据哪个集群代表哪个类别,多次运行会给出不同的结果。
通过设置numpy随机种子可以避免这种情况。
import numpy as np
np.random.seed(42) # Or any other integer
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.