3D PCA in matplotlib: how to add legend?

Question

I am attempting to use http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html for my own data to construct a 3D PCA plot. The tutorial, however, did not specify how I can add a legend. Another page, https://matplotlib.org/users/legend_guide.html did, but I cannot see how I can apply the information in the second tutorial to the first.

How can I modify the code below to add a legend?

# Code source: Gae"l Varoquaux
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets

np.random.seed(5)

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data#the floating point values
y = iris.target#unsigned integers specifying group


fig = plt.figure(1, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

for name, label in [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.spectral,
           edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

plt.show()

Answer 1

There are some issues with the other answer on which neither the OP, nor the answerer seem to be clear about; this is hence not a complete answer, but rather an appendix to the existing answer.

The spectral colormap has been removed from matplotlib in version 2.2, use Spectral or nipy_spectral or any other valid colormap .
Any colormap in matplotlib ranges from 0 to 1. If you call it with any value outside that range, it will just give your the outmost color. To get a color from a colormap you hence need to normalize the values. This is done via a Normalize instance. In this case this is internal to scatter .
Hence use sc = ax.scatter(...) and then sc.cmap(sc.norm(value)) to get a value according to the same mapping that is used within the scatter. Therefore the code should rather use
```
[sc.cmap(sc.norm(i)) for i in [1, 2, 0]]
```
The legend is outside the figure. The figure is 4 x 3 inches in size ( figsize=(4, 3) ). The axes takes 95% of that space in width ( rect=[0, 0, .95, 1] ). The call to legend places the legend's right center point at 1.7 times the axes width = 4*0.95*1.7 = 6.46 inches. ( bbox_to_anchor=(1.7,0.5) ).

Alternative suggestion from my side: Make the figure larger ( figsize=(5.5, 3) ), such that the legend will fit in, make the axes take only 70% of the figure width, such that you have 30% left for the legend. Position the legend's left side close to the axes boundary ( bbox_to_anchor=(1.0, .5) ).

For more on this topic see How to put the legend out of the plot .
The reason you still see the complete figure including the legend in a jupyter notebook is that jupyter will just save everything inside the canvas, even if it overlaps and thereby enlarge the figure.

In total the code may then look like

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np; np.random.seed(5)
from sklearn import decomposition, datasets 

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data #the floating point values
y = iris.target #unsigned integers specifying group

fig = plt.figure(figsize=(5.5, 3))
ax = Axes3D(fig, rect=[0, 0, .7, 1], elev=48, azim=134)

pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

labelTups = [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]
for name, label in labelTups:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
sc = ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap="Spectral", edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

colors = [sc.cmap(sc.norm(i)) for i in [1, 2, 0]]
custom_lines = [plt.Line2D([],[], ls="", marker='.', 
                mec='k', mfc=c, mew=.1, ms=20) for c in colors]
ax.legend(custom_lines, [lt[0] for lt in labelTups], 
          loc='center left', bbox_to_anchor=(1.0, .5))

plt.show()

and produce

Answer 2

Needed a few tweaks ( plt.cm.spectral is the danged weirdest colormap I've ever dealt with), but it seems to be good now:

from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from sklearn import decomposition
from sklearn import datasets

np.random.seed(5)

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data#the floating point values
y = iris.target#unsigned integers specifying group


fig = plt.figure(1, figsize=(4, 3))
plt.clf()
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)

plt.cla()
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

labelTups = [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]
for name, label in labelTups:
    ax.text3D(X[y == label, 0].mean(),
              X[y == label, 1].mean() + 1.5,
              X[y == label, 2].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
# Reorder the labels to have colors matching the cluster results
y = np.choose(y, [1, 2, 0]).astype(np.float)
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.spectral, edgecolor='k')

ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])

colors = [plt.cm.spectral(np.float(i/2)) for i in [1, 2, 0]]
custom_lines = [Line2D([0], [0], linestyle="none", marker='.', markeredgecolor='k', markerfacecolor=c, markeredgewidth=.1, markersize=20) for c in colors]
ax.legend(custom_lines, [lt[0] for lt in labelTups], loc='right', bbox_to_anchor=(1.7, .5))

plt.show()

Here's a link to an online Jupyter notebook with a live version of the script (requires an account for rerunning, though).

Short explanation

You're trying to add three legend markers for a single plot, which is nonstandard behavior. Thus, you need to manually create the shapes that your legend will display.

Longer explanation

This line of code recreates the colors you used in your plot:

colors = [plt.cm.spectral(np.float(i/2)) for i in [1, 2, 0]]

and then this line of code draws some appropriate-looking dots that we'll eventually display on your legend:

custom_lines = [Line2D([0], [0], linestyle="none", marker='.', markeredgecolor='k', markerfacecolor=c, markeredgewidth=.1, markersize=20) for c in colors]

The first two args are just the (internal) x and y coords of the single dot that will be drawn, linestyle="none" suppresses the line that Line2D would normally draw by default, and the rest of the args create and style the dot itself (referred to as a marker in the terminology of the matplotlib api).

Finally, this statement actually creates the legend:

ax.legend(custom_lines, [lt[0] for lt in labelTups], loc='right', bbox_to_anchor=(1.7, .5))

The first arg is of course a list of the dots we just drew, and the second arg is a list of the labels (one per dot). The remaining two args tell matplotlib where to draw the actual box containing the legend. The last arg, bbox_to_anchor , is basically a way to manually fiddle with the positioning of the legend, which I had to do since matplotlib support for 3D anything is still a little behind the curve. On 2D plots you typically don't need it, and, since matplotlib usually does a decent job of automatically positioning the legend on 2D plots in the first place, you often don't even need the loc arg either.

Some colormap weirdness

Don't quite know what was going on with plt.cm.spectral , but in order to get it to behave, for every value I fed it I had to:

a) first cast the value to float

b) then divide the value by 2

a) does occur explicitly in the OP's original code, right before they plot. The divide by 2 thing, I don't know where that comes from. Somehow the call to ax.scatter is implicitly normalizing all of the y values so that the maximum is 1? I guess?

3D PCA in matplotlib: how to add legend?

Question

2 answers

solution1
4 ACCPTED 2018-03-30 00:52:31

solution2
2 2018-03-29 21:12:09

Short explanation

Longer explanation

Some colormap weirdness

3D PCA in matplotlib: how to add legend?

Question

2 answers

solution1 4 ACCPTED 2018-03-30 00:52:31

solution2 2 2018-03-29 21:12:09

Short explanation

Longer explanation

Some colormap weirdness

solution1
4 ACCPTED 2018-03-30 00:52:31

solution2
2 2018-03-29 21:12:09