I am using the following python code to cluster my datapoints using kmeans.
data = np.array([[30, 17, 10, 32, 32], [18, 20, 6, 20, 15], [10, 8, 10, 20, 21], [3, 16, 20, 10, 17], [3, 15, 21, 17, 20]])
kmeans_clustering = KMeans( n_clusters = 3 )
idx = kmeans_clustering.fit_predict( data )
#use t-sne
X = TSNE(n_components=2).fit_transform( data )
fig = plt.figure(1)
plt.clf()
#plot graph
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
plt.scatter(X[:,0], X[:,1], c=colors[kmeans_clustering.labels_])
plt.title('K-Means (t-SNE)')
plt.show()
However, the plot of the clusters I get is wrong as I get everything in one point.
Hence, please let me know where I am making my code wrong? I want to view the kmeans clusters seperately in my scatter plot.
EDIT
The t-sne vales I get are as follows.
[[ 1.12758575e-04 9.30458337e-05]
[ -1.82559784e-04 -1.06657936e-04]
[ -9.56485652e-05 -2.38951623e-04]
[ 5.56515580e-05 -4.42453191e-07]
[ -1.42039677e-04 -5.62548119e-05]]
Use the perplexity
parameter of the TSNE
. The default value of the perplexity
is 30, it seems that's too much for your case, even though the documentation states that TSNE
is quite insensitive to this parameter.
The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. The choice is not extremely critical since t-SNE is quite insensitive to this parameter.
X = TSNE(n_components=2, perplexity=2.0).fit_transform( data )
You could also use PCA (Principal Components Analysis) instead of t-SNE to plot your clusters:
import numpy as np
import pandas as pd
from sklearn.cluster import Kmeans
from sklearn.decomposition import PCA
data = np.array([[30, 17, 10, 32, 32], [18, 20, 6, 20, 15], [10, 8, 10, 20,
21], [3, 16, 20, 10, 17], [3, 15, 21, 17, 20]])
kmeans = KMeans(n_clusters = 3)
labels = kmeans.fit_predict(data)
pca = PCA(n_components=2)
data_reduced = pca.fit_transform(data)
data_reduced = pd.DataFrame(data_reduced)
ax = data_reduced.plot(kind='scatter', x=0, y=1, c=labels, cmap='rainbow')
ax.set_xlabel('PC1')
ax.set_ylabel('PC2')
ax.set_title('Projection of the clustering on a the axis of the PCA')
for x, y, label in zip(data_reduced[0], data_reduced[1], kmeans.labels_):
ax.annotate('Cluster {0}'.format(label), (x,y))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.