简体   繁体   English

大熊猫散布具有三点和seaborn的颜色

[英]pandas scatter plot colors with three points and seaborn

There is a strange behavior when using pandas and seaborn to plot a scatter plot that has only three points: the points don't have the same color. 使用pandas和seaborn绘制仅包含三个点的散点图时,会有一个奇怪的行为:这些点的颜色不同。 The problem disappears when seaborn is not loaded or when there are more than three points, or when plotting with matplotlib's scatter method directly. 当未加载seaborn或具有三个以上点时,或者直接使用matplotlib的散点图进行绘制时,问题消失了。 See the following example: 请参见以下示例:

from pandas import DataFrame #0.16.0
import matplotlib.pyplot as plt #1.4.3
import seaborn as sns #0.5.1
import numpy as np #1.9.2

df = DataFrame({'x': np.random.uniform(0, 1, 3), 'y': np.random.uniform(0, 1, 3)})
df.plot(kind = 'scatter', x = 'x', y = 'y')
plt.show()

df = DataFrame({'x': np.random.uniform(0, 1, 4), 'y': np.random.uniform(0, 1, 4)})
df.plot(kind = 'scatter', x = 'x', y = 'y')
plt.show()

I've tracked down the bug. 我已经找到了错误。 The bug is in pandas technically, not seaborn as I originally thought, though it involves code from pandas , seaborn , and matplotlib ... 该缺陷是pandas在技术上,而不是seaborn因为我本来以为,尽管它涉及到从代码pandasseabornmatplotlib ...

In pandas.tools.plotting.ScatterPlot._make_plot the following code occurs to choose the colours to be used in the scatter plot pandas.tools.plotting.ScatterPlot._make_plot ,出现以下代码来选择散点图中要使用的颜色

if c is None:
    c_values = self.plt.rcParams['patch.facecolor']
elif c_is_column:
    c_values = self.data[c].values
else:
    c_values = c

In your case c will be equal to None , which is the default value, and so c_values will be given by plt.rcParams['patch.facecolor'] . 在您的情况下, c等于默认值None ,因此c_values将由plt.rcParams['patch.facecolor']

Now, as part of setting itself up, seaborn modifies plt.rcParams['patch.facecolor'] to (0.5725490196078431, 0.7764705882352941, 1.0) which is an RGB tuple. 现在,作为设置的一部分,seaborn将plt.rcParams['patch.facecolor']修改为(0.5725490196078431, 0.7764705882352941, 1.0) ,这是一个RGB元组。 If seaborn is not used then the value is the matplotlib default which is 'b' (a string indicating the colour "blue"). 如果未使用seaborn则该值为matplotlib的默认值,即'b' (指示颜色为“蓝色”的字符串)。

c_values is then used later on to actually plot the graph within ax.scatter c_values是后来用在实际中绘制图形ax.scatter

scatter = ax.scatter(data[x].values, data[y].values, c=c_values,
                     label=label, cmap=cmap, **self.kwds)

The issue arises because the keyword argument c can accept multiple different types of argument, it can accept:- 出现此问题是因为关键字参数c可以接受多种不同类型的参数,它可以接受:

  • a string (such as 'b' in the original matplotlib case); 字符串(例如在原始matplotlib情况下为'b' );
  • a sequence of color specifications (say a sequence of RGB values); 颜色规格序列(例如RGB值序列);
  • a sequence of values to map onto the current colormap. 一系列值映射到当前颜色图。

The matplotlib docs specifically state the following, highlighting mine matplotlib文档专门指出以下内容,突出显示我的

c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). c可以是单个颜色格式字符串,也可以是长度为N的颜色规范序列,也可以是使用通过kwargs指定的cmap和norm映射到颜色的N个数字序列(请参见下文)。 Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. 请注意,c不应是单个数字RGB或RGBA序列,因为这与要进行颜色映射的值数组是无法区分的。 c can be a 2-D array in which the rows are RGB or RGBA, however. c可以是一个二维数组,其中的行是RGB或RGBA。

What basically happens is that matplotlib takes the c_values value (which is a tuple of three numbers) and then maps those colours onto the current colormap (which is set by pandas to be Greys by default). 基本上发生的是matplotlib取c_values值(这是三个数字的元组),然后将这些颜色映射到当前的颜色表(默认情况下,pandas将其设置为Greys )。 As such, you get three scatter points with different "greyishness" . 这样,您将获得三个具有不同“灰色度”的散点。 When you have more than 3 scatter points, matplotlib assumes that it must be a RGB tuple because the length doesn't match the length of the data arrays (3 != 4) and so uses it as a constant RBG colour. 当分散点超过3个时,matplotlib假定它必须是RGB元组,因为其长度与数据数组的长度不匹配(3!= 4),因此将其用作恒定的RBG颜色。

This has been written up as a bug report on the pandas Github here . 这已经作为关于Github大熊猫的错误​​报告写在这里

You might want to try this: 您可能要尝试以下操作:

import seaborn.apionly as sns

And see This question for more details. 并参阅此问题以获取更多详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM