R中二维核密度估计的混淆

Question

A kernel density estimator is used to estimate a particular probability density function (see mvstat.net and sckit-learn docs for references) 内核密度估计器用于估计特定的概率密度函数（请参阅mvstat.net和sckit-learn文档以获取参考）

My confusion is about what exactly does kde2d() do? 我的困惑是kde2d()到底是做什么的？ Does it estimate the joint distribution probability density function of two random variables f(a,b) in the below example? 在下面的示例中，是否估计两个随机变量f（a，b）的联合分布概率密度函数？ And what does the color mean? 颜色是什么意思？

Here is the code example I am referring to. 这是我参考的代码示例。

b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
density <- kde2d(a, b, n=100)

colour_flow <- colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred'))
filled.contour(density, color.palette=colour_flow)

Answer 1

What is a kernel density estimator? 什么是核密度估计器？ Essentially it fits a little normal density curve over every point (the center of the normal density being that point) of the data and then adds up all little normal densities to a kernel density estimator. 从本质上讲，它适合数据的每个点（法线密度的中心就是该点）上的一条法线密度曲线，然后将所有法线密度加到一个核密度估计器上。

For the sake of illustration I will add an image of a 1 dimensional kernel density estimator from one of your links . 为了便于说明，我将从您的链接之一添加一维内核密度估计器的图像。

What about 2 dimensional kernel densities? 二维核密度呢？

# library(MASS)
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
# a and b contain 1000 values each. 

density <- kde2d(a,b,n=100)

The function creates a grid from min(a) to max(a) and from min(b) to max(b) . 该函数创建一个从min(a)到max(a)以及从min(b)到max(b) 。 Instead of fitting a tiny 1D normal density over every value in a or b , kde2d now fits a tiny 2D normal density over every point in the grid. 相反，在装修上的每个值一个很小的1D正常密度的a或b ， kde2d现在套在网格中的每一点微小的2D正常密度。 Just like in the 1 dimensional case kernel density, it then adds up all density values. 就像在一维情况下的内核密度一样，它然后将所有密度值相加。

What do the colours mean? 颜色是什么意思？ As @cel pointed out in the comments: the estimated probability depends on two variables, so we have three axes now ( a , b and estimated probability ). 正如@cel在评论中指出的那样：估计概率取决于两个变量，所以我们现在有三个轴（ a ， b和estimated probability ）。 One way to visualize 3 axes is by using iso-probability contours . 可视化3轴的一种方法是使用等概率线 。 This sounds fancy, but it is basically the same as the high/low pressure images we know from the weather forecast. 这听起来很花哨，但它与我们从天气预报中获得的高/低压图像基本相同。

You are using 您正在使用

filled.contour(density, 
    color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))

So from low to high, the plot will be coloured white , blue , yellow , red and eventually darkred for the highest values of estimated probability. 因此，从低到高，该地块将被着色white ， blue ， yellow ， red ，并最终darkred的估计概率最高值。 This results in the following plot: 结果如下图：

R中二维核密度估计的混淆

问题描述

1 个解决方案

解决方案1
7 2016-08-04 13:51:51

R中二维核密度估计的混淆

问题描述

1 个解决方案

解决方案1 7 2016-08-04 13:51:51

解决方案1
7 2016-08-04 13:51:51