简体   繁体   English

如何更改 seaborn 的 pairplot() 函数中的 bin 数量?

[英]How to change the number of bins in seaborn's pairplot() function?

I have a data set of 36000 rows and 51 columns.我有一个 36000 行和 51 列的数据集。 Each row is an observation and the first 50 columns are 50 different features of each observation.每行是一个观察值,前 50 列是每个观察值的 50 个不同特征。 The 51th columns is one with values 0 or 1, where 0 means that the observation belongs to class A and 1 means it belongs to class B.第 51 列是值为 0 或 1 的列,其中 0 表示观测属于 A 类,1 表示它属于 B 类。

Now let's say I want to make a histogram of the values of the first column, call it Feature1.现在假设我想制作第一列值的直方图,将其命名为 Feature1。 As far as I know, matplotlib's plt.hist() doesn't have the ability to draw 2 histograms in the same plot, one of them corresponding to the features of Feature1 from class A and the other corresponding to the ones from class B. Also, seaborn's sns.distplot doesn't do it as well.据我所知,matplotlib 的 plt.hist() 没有能力在同一个图中绘制 2 个直方图,其中一个对应于 A 类的 Feature1 的特征,另一个对应于 B 类的特征。此外,seaborn 的 sns.distplot 也没有这样做。 So I decided to try seaborn's pairplot as follows所以我决定尝试seaborn的pairplot如下

sns.pairplot(df, vars = ["Feature1"], hue= "Class", diag_kind = "hist", diag_kws= dict(alpha=0.55))

Feature1 is the name of the 1st column and Class the name of the last column, which contains the class labels for each observation. Feature1 是第一列的名称,Class 是最后一列的名称,其中包含每个观察的类标签。 The histogram that appears is fine, but I would like to increase the number of bins used.出现的直方图很好,但我想增加使用的 bin 数量。 Sadly I didn't find any way to do that using this particular function.可悲的是,我没有找到任何使用此特定功能的方法。

Is anyone aware of a solution to this problem?有没有人知道这个问题的解决方案? Thanks谢谢

To expound upon the comment by Bugbeeb , when using diag_kind = 'hist' the diag_kws are passed into plt.hist() .为了阐述Bugbeeb评论,当使用diag_kind = 'hist'diag_kws被传递到plt.hist() This is not outlined in the documentation but is clear from the source ,这在文档中没有概述,但从源头上很清楚,

 def PairPlot(...): # ... if diag_kind == "hist": grid.map_diag(plt.hist, **diag_kws) # ...

Since plt.hist() accepts the argument bins as an integer to control the number of bins you can simply do由于plt.hist()接受参数bins作为整数来控制 bins 的数量,你可以简单地做

sns.pairplot(df, vars = ["Feature1"], hue = "Class", diag_kind = "hist", 
             diag_kws = {'alpha':0.55, 'bins':n})

Where n is the number of bins desired as an int .其中n是作为int所需的 bin 数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM