简体   繁体   English

为什么matplotlib显示的图形有三种以上的颜色,而我只有三个标签?

[英]Why the figure showed by matplotlib has more than three colors while I have only three labels?

I'm starting to learn to use matplotlib to draw figures. 我开始学习使用matplotlib绘制图形。 When I was using the famous iris dataset and trying to draw a plot figure, I encountered a question. 当我使用著名的iris数据集并尝试绘制绘图时,我遇到了一个问题。

import numpy as np
import pandas as pd
import matplotlib.pylab as pl

raw = pd.read_csv('iris.csv')
data = raw.values
print data
x = data[:,0]
y = data[:,1]
pl.scatter(x,y,color = ['r','g','b'], s = [30,40,50], alpha=0.5)
pl.figure()
pl.show()
labels = set(data[:,4])
print labels

I got the output 我得到了输出

 ...
 [6.7 3.3 5.7 2.5 'Iris-virginica']
 [6.7 3.0 5.2 2.3 'Iris-virginica']
 [6.3 2.5 5.0 1.9 'Iris-virginica']
 [6.5 3.0 5.2 2.0 'Iris-virginica']
 [6.2 3.4 5.4 2.3 'Iris-virginica']
 [5.9 3.0 5.1 1.8 'Iris-virginica']]
set(['Iris-virginica', 'Iris-setosa', 'Iris-versicolor'])

I only used the first two features because I didn't know whether it is possible to draw high dimensional figures. 我只使用了前两个功能,因为我不知道是否可以绘制高维图形。

This is the figure I got 这是我得到的数字 在此处输入图片说明

There were more than three colors while, you can see from the output, there were exactly three labels ('Iris-virginica', 'Iris-setosa', 'Iris-versicolor') . 从输出中可以看到三种以上的颜色,而恰好有三种标签('Iris-virginica', 'Iris-setosa', 'Iris-versicolor')

I wonder how does matplotlib decide what color to use? 我想知道matplotlib如何决定使用哪种颜色?
What are the different colors for? 有什么不同的颜色?

What should I do to show a three-color plot figure? 如何显示三色绘图?

You obtained this figure with pyplot.scatter , more specifically with this line of code: 您是通过pyplot.scatter获得此图的,更具体地说,是通过以下代码行获得的:

pl.scatter(x, y, color=['r','g','b'], s=[30,40,50], alpha=0.5)

In the line above, there is no indications whatsoever about labels. 在上面的行中,没有任何关于标签的指示。 x and y are only two list of numbers. xy只是两个数字列表。

To color the dots, scatter uses the argument color=['r', 'g', 'b'] . 为了给点着色, scatter点使用了参数color=['r', 'g', 'b'] If color is the same size than x and y , then each dot has a defined color. 如果colorxy大小相同,则每个点都有定义的颜色。 But if color is smaller than x and y , then scatter will loop through color as many times as needed. 但是,如果color小于xy ,则scatter将根据需要遍历color多次。 For example: 例如:

x = [1, 2, 3, 4, 5]    
color = ['r', 'g', 'b'] becomes ['r', 'g', 'b', 'r', 'g']

And for the last mystery "why is there more than three colors on the plots" , it's because the transparency alpha is set to 0.5 (all colors are 50% transparent). 对于最后一个谜题“为什么绘图上不存在三种颜色” ,这是因为透明度alpha设置为0.5(所有颜色均为50%透明)。 Some of the data points have the same x and y coordinates, so the colors overlay, and it looks like there's more colors than red, green, blue. 一些数据点具有相同的xy坐标,因此颜色会重叠,并且看起来比红色,绿色,蓝色具有更多的颜色。


To plot the right colors, you need to use the labels informations. 要绘制正确的颜色,您需要使用标签信息。 Python scatter plot with colors corresponding to strings should help you. Python散点图与字符串对应的颜色应该可以为您提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM