简体   繁体   English

R中二维核密度估计的混淆

[英]confusion on 2 dimension kernel density estimation in R

A kernel density estimator is used to estimate a particular probability density function (see mvstat.net and sckit-learn docs for references) 内核密度估计器用于估计特定的概率密度函数(请参阅mvstat.netsckit-learn文档以获取参考)

My confusion is about what exactly does kde2d() do? 我的困惑是kde2d()到底是做什么的? Does it estimate the joint distribution probability density function of two random variables f(a,b) in the below example? 在下面的示例中是否估计两个随机变量f(a,b)的联合分布概率密度函数? And what does the color mean? 颜色是什么意思?

Here is the code example I am referring to. 这是我参考的代码示例。

b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
density <- kde2d(a, b, n=100)

colour_flow <- colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred'))
filled.contour(density, color.palette=colour_flow)

What is a kernel density estimator? 什么是核密度估计器? Essentially it fits a little normal density curve over every point (the center of the normal density being that point) of the data and then adds up all little normal densities to a kernel density estimator. 从本质上讲,它适合数据的每个点(法线密度的中心就是该点)上的一条法线密度曲线,然后将所有法线密度加到一个核密度估计器上。

For the sake of illustration I will add an image of a 1 dimensional kernel density estimator from one of your links . 为了便于说明,我将从您的链接之一添加一维内核密度估计器的图像。 在此处输入图片说明

What about 2 dimensional kernel densities? 二维核密度呢?

# library(MASS)
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
# a and b contain 1000 values each. 

density <- kde2d(a,b,n=100) 

The function creates a grid from min(a) to max(a) and from min(b) to max(b) . 该函数创建一个从min(a)max(a)以及从min(b)max(b) Instead of fitting a tiny 1D normal density over every value in a or b , kde2d now fits a tiny 2D normal density over every point in the grid. 相反,在装修上的每个值一个很小的1D正常密度的abkde2d现在套在网格中的每一点微小的2D正常密度。 Just like in the 1 dimensional case kernel density, it then adds up all density values. 就像在一维情况下的内核密度一样,它然后将所有密度值相加。

What do the colours mean? 颜色是什么意思? As @cel pointed out in the comments: the estimated probability depends on two variables, so we have three axes now ( a , b and estimated probability ). 正如@cel在评论中指出的那样:估计概率取决于两个变量,所以我们现在有三个轴( abestimated probability )。 One way to visualize 3 axes is by using iso-probability contours . 可视化3轴的一种方法是使用等概率线 This sounds fancy, but it is basically the same as the high/low pressure images we know from the weather forecast. 这听起来很花哨,但它与我们从天气预报中获得的高/低压图像基本相同。

You are using 您正在使用

filled.contour(density, 
    color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))

So from low to high, the plot will be coloured white , blue , yellow , red and eventually darkred for the highest values of estimated probability. 因此,从低到高,该地块将被着色whiteblueyellowred ,并最终darkred的估计概率最高值。 This results in the following plot: 结果如下图:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM