简体   繁体   English

kde2d密度比较

[英]kde2d density comparison

I have a question about the kde2d (Kernel density estimator). 我有一个关于kde2d (Kernel density estimator). I am computing two different kde2d for two different sets of data in the same space of variables. 我在同一个变量空间中为两组不同的数据计算两个不同的kde2d。 When I compare both with a filled.contour2 or contours I see that the set with lower density of points in a scatter plot(Also has less points in the total with a factor of 10) has an higher density for the contours values. 当我将它们与filled.contour2或轮廓进行比较时,我发现散点图中点数密度较低的集合(总计中的点数较少,因子为10)对于轮廓值具有较高的密度。 I was expecting that the set with higher point density will have higher density contours values, but like I said above is not the case. 我期望具有更高点密度的集合将具有更高的密度轮廓值,但是如上所述并非如此。 It has to be with the choice of bandwidth (h)? 它必须与带宽(h)的选择? I am using equals h, and i tried to change but the result did not changed a lot. 我使用等于h,我试图改变,但结果并没有改变很多。 What is my error? 我的错误是什么?

An example 一个例子

a <-  runif(1000, 5.0, 7.5)
b <-  runif(1000, 2.0, 3.0)
c <-  runif(100000,5.0, 7.5)
d <-  runif(100000, 2.0, 3.0)
library(MASS)
abdens <- kde2d(a,b,n=100,h=0.5)
cddens <- kde2d(c,d,n=100,h=0.5)
mylevels <- seq(-2.4,30,0.9)
filled.contour2(abdens,xlab="a",ylab="b",xlim=c(5,7.5),ylim=c(2,3), 
                col=hsv(seq(0,1,length=length(mylevels))))
 plot(a,b)
contour(abdens,nlevels=5,add=T,col="blue")
plot(c,d)
contour(cddens,nlevels=5,add=T,col="orange")

I'm not sure I agree that the densities should be different in the uniform case. 我不确定我是否同意在统一案件中密度应该不同。 I would have expected a set with a higher number of randomly drawn points from a Normal distribution to have more support for extreme regions and therefore to have lower (estimated) density in the center. 我原本期望从正态分布中获得更多随机抽取点的集合,以获得对极端区域的更多支持,从而在中心具有更低(估计)的密度。 That effect might be also be occasionally apparent with bibariate Uniform draws with 1,000 points versus 100,000. 这种影响也可能偶尔会出现,而bibariate Uniform得分为1,000分,而不是100,000分。 I hope my modifications to your code are acceptable. 我希望我的代码修改是可以接受的。 It's easier to see the contour s if done after the plots. 如果在绘图之后完成,则更容易看到contour

(The theoretic density would be the same in both these cases since the density distribution is normalized to an integral of 1.0. We are only looking at estimates with some expected artifacts from "edge" effects. In the case of univariate densities adding the information about bounds can be done with the desity functions in package::logspline.) (理论密度在这两种情况下都是相同的,因为密度分布被归一化为1.0的积分。我们只看到具有一些预期伪影来自“边缘”效应的估计。在单变量密度的情况下添加关于可以使用package :: logspline中的desity函数完成边界。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM