简体   繁体   English

2D 核密度图的主要区别:Seaborn 和 R

[英]Major Difference in 2D kernel Density Plots: Seaborn and R

I am trying to plot data using the 2D kernel density plot of Seaborn's jointplot function (using statsmodels' KDEMultivariate function to calculate a data-driven bandwidth).我正在尝试使用 Seaborn 的 Jointplot 函数的 2D 核密度图绘制数据(使用 statsmodels 的 KDEMultivariate 函数来计算数据驱动的带宽)。 I've plotted a 2D kernel density in R using the same data and the result looks very good (using the 'ks' package), while the Seaborn plot looks very very different.我已经使用相同的数据在 R 中绘制了 2D 内核密度,结果看起来非常好(使用 'ks' 包),而 Seaborn 图看起来非常不同。

I am using the same exact data and the same exact bandwidth for each (taking the bandwidth given by KDEMultivariant and passing that to the R method).我为每个数据使用相同的精确数据和相同的精确带宽(采用 KDEMultivariant 给出的带宽并将其传递给 R 方法)。

Here is the input.csv data used: https://app.box.com/s/ot7d36t44wrr85pusp5657pc1w2kf5hj这是使用的 input.csv 数据: https ://app.box.com/s/ot7d36t44wrr85pusp5657pc1w2kf5hj

Below are the code used in each and output images from each.下面是每个中使用的代码和每个的输出图像。

Python / Seaborn:蟒蛇/海生:

import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv("input.csv", dtype={'x': float, 'y': float}, skiprows=0)
bw_ml_x = sm.nonparametric.KDEMultivariate(data=data['x'], var_type='c', bw='cv_ml')
bw_ml_y = sm.nonparametric.KDEMultivariate(data=data['y'], var_type='c', bw='cv_ml')        

g = sns.jointplot(x='x', y='y', data=data, kind="kde", stat_func=None, bw=[bw_ml_x.bw, bw_ml_y.bw])

g.plot_joint(plt.scatter, c="w")
g.ax_joint.collections[0].set_alpha(0)

sns.plt.show()

Img for Seaborn plot: Seaborn 图的 Img:

The bandwidth given by bw_ml_x.bw and bw_ml_y.bw is placed in a 2 x 2 R matrix H, where H[1,1] = bw_ml_x.bw, H[2,2] = bw_ml.y.bw, and other values set to zero. bw_ml_x.bw 和 bw_ml_y.bw 给出的带宽被放置在一个 2 x 2 R 矩阵 H 中,其中 H[1,1] = bw_ml_x.bw,H[2,2] = bw_ml.y.bw,以及其他值设置为零。

R:回复:

library(ks)
fhat <- kde(x=as.data.frame(data[1], data[2]), H=H)
plot(fhat, display="filled.contour2", cont=seq(10,90,by=10))

Img for R plot: R图的Img:

Looking at your Seaborn/Python plot, many of the points cluster along the (0,n) region and the (1,1) region of your space, just as the KDE of the R plot shows.查看您的 Seaborn/Python 图,许多点沿着空间的 (0,n) 区域和 (1,1) 区域聚集,正如 R 图的 KDE 所示。 This indicates that Seaborn and R are looking at the same data;这表明 Seaborn 和 R 正在查看相同的数据; we simply need to reformulate the call to the kde in Seaborn in order to visualize the KDE gradients.我们只需要在 Seaborn 中重新制定对kde的调用,以便可视化 KDE 梯度。

If you modify your Python call to match the documentation for Kernel Density Estimation in Seaborn you'll get a proper 2d-kdf out of Python:如果您修改 Python 调用以匹配 Seaborn 中Kernel Density Estimation文档,您将从 Python 中获得正确的 2d-kdf:

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
import seaborn as sns

data = pd.read_csv("input.csv", dtype={'x': float, 'y': float}, skiprows=0)
bw_ml_x = sm.nonparametric.KDEMultivariate(data=data['x'], var_type='c', bw='cv_ml')
bw_ml_y = sm.nonparametric.KDEMultivariate(data=data['y'], var_type='c', bw='cv_ml')        

g = sns.jointplot(x='x', y='y', data=data, kind="kde")

g.plot_joint(plt.scatter, c="w")
g.ax_joint.collections[0].set_alpha(0)

sns.plt.show()

在此处输入图片说明

This accords with the R plot (though the kernel estimators seem to be slightly different, which would account for the variation in gradients between the plots):这与 R 图一致(尽管核估计量似乎略有不同,这将解释图之间的梯度变化):

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM