How do I add legend to the plot over in my scenario? the parameter of text is the text = tfidf.transform(document)
and the parameter of clusters are the unsupervised clusters ranging from 0 to 19 clusters and have their bag of words. How do I add the legend to the plots? It is indistinguishable that which color corresponds to which cluster.
def plot_tsne_pca(data, labels):
max_label = max(labels)
max_items = np.random.choice(range(data.shape[0]), size=3000, replace=False)
pca = PCA(n_components=2).fit_transform(data[max_items,:].todense())
tsne = TSNE().fit_transform(PCA(n_components=50).fit_transform(data[max_items,:].todense()))
idx = np.random.choice(range(pca.shape[0]), size=3000, replace=False)
label_subset = labels[max_items]
label_subset = [cm.hsv(i/max_label) for i in label_subset[idx]]
f, ax = plt.subplots(1, 2, figsize=(20, 6))
ax[0].scatter(pca[idx, 0], pca[idx, 1], c=label_subset)
ax[0].set_title('PCA Cluster Plot')
ax[1].scatter(tsne[idx, 0], tsne[idx, 1], c=label_subset)
ax[1].set_title('TSNE Cluster Plot')
plot_tsne_pca(text, clusters)
Here is the full example of the code: https://pastebin.com/3PABg7xh
You can use legend_elements()
to automatically return the lists of artists/labels (or a subset thereof) for legend creation. See Automated legend creation for more details
import matplotlib.pyplot as plt
from matplotlib import offsetbox
from sklearn import (manifold, datasets)
digits = datasets.load_digits(n_class=6)
X = digits.data
y = digits.target
tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)
X_tsne = tsne.fit_transform(X)
fig, ax = plt.subplots()
sc = ax.scatter(X_tsne[:,0], X_tsne[:,1], c=y, cmap='tab10')
ax.legend(*sc.legend_elements(), title='clusters')
EDIT
In your particular case, the code was not working because legend_elements()
is meant to be used when you have a mapping between a numeric c=
list and a colormap. But instead, you were passing a list of colors that you constructed by hand ( label_subset = [cm.hsv(i/max_label) for i in label_subset[idx]]
). If you remove that line and keep a numeric label_subset
and map it to colors using cmap=
then everything works as expected
def plot_tsne_pca(data, labels, sizelist, cmap='tab10'):
max_label = max(labels)
max_items = np.random.choice(range(data.shape[0]), sizelist, replace=False)
pca = PCA(n_components=2).fit_transform(data[max_items, :].todense())
tsne = TSNE().fit_transform(PCA(n_components=1).fit_transform(data[max_items, :].todense()))
idx = np.random.choice(range(pca.shape[0]), sizelist, replace=False)
label_subset = labels[max_items]
#label_subset = [cm.hsv(i / max_label) for i in label_subset[idx]]
f, ax = plt.subplots(1, 2, figsize=(20, 6))
ax[0].scatter(pca[idx, 0], pca[idx, 1], c=label_subset, cmap=cmap)
ax[0].set_title('PCA Cluster Plot')
sc = ax[1].scatter(tsne[idx, 0], tsne[idx, 1], c=label_subset, cmap=cmap)
ax[1].set_title('TSNE Cluster Plot')
ax[1].legend(*sc.legend_elements(), title='clusters')
plot_tsne_pca(text, clusters, sizelist)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.