简体   繁体   English

核密度的导数

[英]Derivative of Kernel Density

I am using density {stats} to construct a kernel "gaussian' density of a vector of variables. If I use the following example dataset: 我使用密度{stats}来构造变量向量的内核“高斯”密度。如果我使用以下示例数据集:

    x <- rlogis(1475, location=0, scale=1)  # x is a vector of values - taken from a rlogis just for the purpose of explanation
    d<- density(x=x, kernel="gaussian")

Is there some way to get the first derivative of this density d at each of the n=1475 points 有没有办法在每个n=1475点获得该密度d的一阶导数

Edit #2: 编辑#2:

Following up on Greg Snow's excellent suggestion to use the analytical expression for the derivative of a Gaussian , and our conversation following his post, this will get you the exact slope at each of those points: 继Greg Snow关于使用高斯导数的解析表达式的优秀建议,以及他们的帖子之后的对话,这将获得每个点的确切斜率:

s <- d$bw; 
slope2 <- sapply(x, function(X) {mean(dnorm(x - X, mean = 0, sd = s) * (x - X))})
## And then, to compare to the method below, plot the results against one another
plot(slope2 ~ slope)

Edit: 编辑:

OK, I just reread your question, and see that you wanted slopes at each of the points in the input vector x . 好的,我只是重读了你的问题,看到你想要输入向量x中每个点的斜率。 Here's one way you might approximate that : 以下是你可能近似一个方法:

slope <- (diff(d$y)/diff(d$x))[findInterval(x, d$x)]

A possible further refinement would be to find the location of the point within its interval, and then calculate its slope as the weighted average of the slope of the present interval and the interval to its right or left. 可能的进一步改进是在该区间内找到该点的位置,然后将其斜率计算为当前区间的斜率和其右侧或左侧的区间的加权平均值。


I'd approach this by averaging the slopes of the segments just to the right and left of each point. 我通过平均每个点的右边和左边的线段的斜率来接近这个。 (A bit of special care needs to be taken for the first and last points, which have no segment to their left and right, respectively.) (对于第一个和最后一个点,需要特别注意,它们的左右分别没有分段。)

dy <- diff(d$y)
dx <- diff(d$x)[1]  ## Works b/c density() returns points at equal x-intervals
((c(dy, tail(dy, 1)) + c(head(dy, 1), dy))/2)/dx

The curve of a density estimator is just the sum of all the kernels, in your case a gaussian (divided by the number of points). 密度估计器的曲线只是所有内核的总和,在您的情况下是高斯(除以点数)。 The derivative of a sum is the sum of the derivatives and the derivative of a constant times a function is that constant times the derivative. 和的导数是导数和常数的导数的和,乘以一个常数乘以导数。 So the derivative of the density estimate at a given point will just be the average of the slopes for the 1475 different gaussian curves at that given point. 因此,给定点处的密度估计的导数将仅是该给定点处的1475条不同高斯曲线的斜率的平均值。 Each gaussian curve will have a mean corresponding to each of the data points and a standard deviation based on the bandwidth. 每条高斯曲线将具有对应于每个数据点的平均值和基于带宽的标准偏差。 So if you can calculate the slope for a gaussian, then finding the slope for the density estimate is just a mean of the 1475 slopes. 因此,如果您可以计算高斯的斜率,那么找到密度估计的斜率只是1475个斜率的平均值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM