简体   繁体   English

使用seaborn.pairplot()以多种颜色绘制数据框?

[英]Plotting a dataframe with seaborn.pairplot() in multiple colors?

I want to create a plot similar to this image in order to compare multiple dims of my dataset. 我想创建一个类似于此图像的绘图,以便比较我的数据集的多个dims。 The dataset is no preset. 数据集没有预设。 I managed to display the data correctly in one color, but I want one colour for y=0 and one for y=1 to compare the points. 我设法以一种颜色正确显示数据,但是我想要一种颜色用于y = 0而一种用于y = 1来比较这些点。 Just like in the image of the iris dataset. 就像在虹膜数据集的图像中一样。 As soon as I include the hue='y' in the sns.pairplot method the code will not compile until the end. 只要我在sns.pairplot方法中包含hue='y' ,代码就不会编译到最后。

Also I dont understand the console output. 另外,我不理解控制台输出。 What's the issue? 有什么问题?

在此输入图像描述 import seaborn as sns; 将seaborn作为sns; sns.set(style="ticks", color_codes=True) import pandas as pd sns.set(style =“ticks”,color_codes = True)将pandas导入为pd

dataframe = pd.DataFrame(dict(F1=X[:, 0], F2=X[:, 1], F3=X[:, 2], F4=X[:, 3], y=y))

print(dataframe)

g = sns.pairplot(dataframe, hue='y')

This is the output for the dataframe . 这是dataframe的输出。 It looks alright to me: 它对我来说没问题:

            F1        F2        F3        F4    y
0     3.173182  2.849991  2.497907  2.851715  0.0
1     2.468625 -0.216985  0.275206  1.232518  1.0
2     2.398419  2.258931  2.255533  4.895872  0.0
3     1.379937  1.041677  1.165911  1.992650  1.0
4     2.489665  2.269068  4.129961  2.218203  0.0
5     4.140160  2.809088  2.973027  3.553128  0.0
6     2.997969  1.701299  2.978875  1.946793  0.0
7     3.864436  3.554276  3.568455  2.839489  0.0
8    -0.000605  1.376971  1.128350  1.293777  1.0
9     2.398057  1.180861  2.400801  2.264726  1.0
10    0.997385 -0.560205  0.954628  2.788858  1.0

...        ...       ...       ...       ...  ...

3990  3.334553  4.576306  2.470476  3.032781  0.0
3991  1.465784  2.304793  1.267303 -0.030802  1.0
3992  0.505905 -0.280769 -1.223464  1.077305  1.0
3993  2.581596  3.924394  3.878303  2.579366  0.0
3994  4.362067  2.247818  2.948595  1.906314  0.0
3995  2.310546  0.006672  2.382227  1.940343  1.0
3996 -0.944635  1.387136  0.604135  2.421478  1.0
3997  1.290999  1.485965  0.262792  0.899340  1.0
3998  0.864532  1.759607  1.118346  1.038935  1.0
3999  1.819110  2.218838  3.927945  2.593009  0.0

[4000 rows x 5 columns]

But eventually I receive this error: 但最终我收到了这个错误:

Traceback (most recent call last):
  File "/Users//PycharmProjects//V3_multiTops/vergleich.py", line 131, in <module>
    g = sns.pairplot(dataframe, hue='y')
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 2111, in pairplot
    grid.map_diag(kdeplot, **diag_kws)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/axisgrid.py", line 1399, in map_diag
    func(data_k, label=label_k, color=color, **kwargs)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 691, in kdeplot
    cumulative=cumulative, **kwargs)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 294, in _univariate_kdeplot
    x, y = _scipy_univariate_kde(data, bw, gridsize, cut, clip)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/seaborn/distributions.py", line 366, in _scipy_univariate_kde
    kde = stats.gaussian_kde(data, bw_method=bw)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 172, in __init__
    self.set_bandwidth(bw_method=bw_method)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 499, in set_bandwidth
    self._compute_covariance()
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/stats/kde.py", line 510, in _compute_covariance
    self._data_inv_cov = linalg.inv(self._data_covariance)
  File "/Users//PycharmProjects//venv/lib/python3.7/site-packages/scipy/linalg/basic.py", line 975, in inv
    raise LinAlgError("singular matrix")
numpy.linalg.linalg.LinAlgError: singular matrix

I think I am doing something wrong with the sns.pairplot() , which I don't understand yet. 我想我正在做sns.pairplot()错误,我还不明白。 Can you explain it to me please? 你能解释一下吗?

The problem seems to be that the "y" column itself is numeric. 问题似乎是"y"列本身是数字。 It would hence be included in the pairgrid as a column/row. 因此它将作为列/行包含在pairgrid中。 This seems undesired anyways. 无论如何这似乎是不受欢迎的。 To select the variables that shall take part in the grid, use the pairplot 's vars keyword. 要选择将参与网格的变量,请使用pairplotvars关键字。

 sns.pairplot(df, vars=df.columns[:-1], hue="y")

The reason the iris dataset works without specifying vars is that the hue column is not numeric. iris数据集在不指定vars情况下工作的原因是hue列不是数字。 Non-numeric columns are not included in the grid. 网格中不包含非数字列。

Complete example: 完整的例子:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(300, 4), columns=[f"F{i+1}" for i in range(4)])
df["y"] = np.random.choice([1., 0.], size=len(df))

sns.pairplot(df, vars=df.columns[:-1], hue="y")
plt.show()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM