简体   繁体   中英

How to add legend to Matplotlib for cluster data?

How do I add legend to the plot over in my scenario? the parameter of text is the text = tfidf.transform(document) and the parameter of clusters are the unsupervised clusters ranging from 0 to 19 clusters and have their bag of words. How do I add the legend to the plots? It is indistinguishable that which color corresponds to which cluster.

def plot_tsne_pca(data, labels):
    max_label = max(labels)
    max_items = np.random.choice(range(data.shape[0]), size=3000, replace=False)
    
    pca = PCA(n_components=2).fit_transform(data[max_items,:].todense())
    tsne = TSNE().fit_transform(PCA(n_components=50).fit_transform(data[max_items,:].todense()))
    
    
    idx = np.random.choice(range(pca.shape[0]), size=3000, replace=False)
    label_subset = labels[max_items]
    label_subset = [cm.hsv(i/max_label) for i in label_subset[idx]]
    f, ax = plt.subplots(1, 2, figsize=(20, 6))
    
    ax[0].scatter(pca[idx, 0], pca[idx, 1], c=label_subset)
    ax[0].set_title('PCA Cluster Plot')
    
    ax[1].scatter(tsne[idx, 0], tsne[idx, 1], c=label_subset)
    ax[1].set_title('TSNE Cluster Plot')


plot_tsne_pca(text, clusters)

Here is the full example of the code: https://pastebin.com/3PABg7xh 阴谋

没有注释掉 label_subset = [cm.hsv(i/max_label) for i in label_subset[idx]]

注释掉标签子集。

You can use legend_elements() to automatically return the lists of artists/labels (or a subset thereof) for legend creation. See Automated legend creation for more details

import matplotlib.pyplot as plt
from matplotlib import offsetbox
from sklearn import (manifold, datasets)

digits = datasets.load_digits(n_class=6)
X = digits.data
y = digits.target

tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)
X_tsne = tsne.fit_transform(X)


fig, ax = plt.subplots()
sc = ax.scatter(X_tsne[:,0], X_tsne[:,1], c=y, cmap='tab10')
ax.legend(*sc.legend_elements(), title='clusters')

在此处输入图像描述

EDIT

In your particular case, the code was not working because legend_elements() is meant to be used when you have a mapping between a numeric c= list and a colormap. But instead, you were passing a list of colors that you constructed by hand ( label_subset = [cm.hsv(i/max_label) for i in label_subset[idx]] ). If you remove that line and keep a numeric label_subset and map it to colors using cmap= then everything works as expected

def plot_tsne_pca(data, labels, sizelist, cmap='tab10'):
    max_label = max(labels)
    max_items = np.random.choice(range(data.shape[0]), sizelist, replace=False)

    pca = PCA(n_components=2).fit_transform(data[max_items, :].todense())
    tsne = TSNE().fit_transform(PCA(n_components=1).fit_transform(data[max_items, :].todense()))

    idx = np.random.choice(range(pca.shape[0]), sizelist, replace=False)
    label_subset = labels[max_items]
    #label_subset = [cm.hsv(i / max_label) for i in label_subset[idx]]
    f, ax = plt.subplots(1, 2, figsize=(20, 6))

    ax[0].scatter(pca[idx, 0], pca[idx, 1], c=label_subset, cmap=cmap)
    ax[0].set_title('PCA Cluster Plot')

    sc = ax[1].scatter(tsne[idx, 0], tsne[idx, 1], c=label_subset, cmap=cmap)
    ax[1].set_title('TSNE Cluster Plot')
    ax[1].legend(*sc.legend_elements(), title='clusters')


plot_tsne_pca(text, clusters, sizelist)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM