简体   繁体   中英

Why the figure showed by matplotlib has more than three colors while I have only three labels?

I'm starting to learn to use matplotlib to draw figures. When I was using the famous iris dataset and trying to draw a plot figure, I encountered a question.

import numpy as np
import pandas as pd
import matplotlib.pylab as pl

raw = pd.read_csv('iris.csv')
data = raw.values
print data
x = data[:,0]
y = data[:,1]
pl.scatter(x,y,color = ['r','g','b'], s = [30,40,50], alpha=0.5)
pl.figure()
pl.show()
labels = set(data[:,4])
print labels

I got the output

 ...
 [6.7 3.3 5.7 2.5 'Iris-virginica']
 [6.7 3.0 5.2 2.3 'Iris-virginica']
 [6.3 2.5 5.0 1.9 'Iris-virginica']
 [6.5 3.0 5.2 2.0 'Iris-virginica']
 [6.2 3.4 5.4 2.3 'Iris-virginica']
 [5.9 3.0 5.1 1.8 'Iris-virginica']]
set(['Iris-virginica', 'Iris-setosa', 'Iris-versicolor'])

I only used the first two features because I didn't know whether it is possible to draw high dimensional figures.

This is the figure I got 在此处输入图片说明

There were more than three colors while, you can see from the output, there were exactly three labels ('Iris-virginica', 'Iris-setosa', 'Iris-versicolor') .

I wonder how does matplotlib decide what color to use?
What are the different colors for?

What should I do to show a three-color plot figure?

You obtained this figure with pyplot.scatter , more specifically with this line of code:

pl.scatter(x, y, color=['r','g','b'], s=[30,40,50], alpha=0.5)

In the line above, there is no indications whatsoever about labels. x and y are only two list of numbers.

To color the dots, scatter uses the argument color=['r', 'g', 'b'] . If color is the same size than x and y , then each dot has a defined color. But if color is smaller than x and y , then scatter will loop through color as many times as needed. For example:

x = [1, 2, 3, 4, 5]    
color = ['r', 'g', 'b'] becomes ['r', 'g', 'b', 'r', 'g']

And for the last mystery "why is there more than three colors on the plots" , it's because the transparency alpha is set to 0.5 (all colors are 50% transparent). Some of the data points have the same x and y coordinates, so the colors overlay, and it looks like there's more colors than red, green, blue.


To plot the right colors, you need to use the labels informations. Python scatter plot with colors corresponding to strings should help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM