简体   繁体   English

计算连续密度图下的面积

[英]Calculating an area under a continuous density plot

I have two density curves plotted using this: 我用这个绘制了两条密度曲线:

Network <- Mydf$Networks
quartiles <-  quantile(Mydf$Avg.Position,  probs=c(25,50,75)/100)
density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network))
d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red")
print(d)

I'd like to compute the area under each curve for a given Avg.Position range. 我想为给定的Avg.Position范围计算每条曲线下的面积。 Sort of like pnorm for the normal curve. 有点像普通曲线的pnorm。 Any ideas? 有任何想法吗?

Calculate the density seperately and plot that one to start with. 单独计算密度并绘制一个开始的密度。 Then you can use basic arithmetics to get the estimate. 然后你可以使用基本的算术来获得估计。 An integration is approximated by adding together the area of a set of little squares. 通过将一组小方块的面积相加在一起来近似积分。 I use the mean method for that. 我使用均值方法。 the length is the difference between two x-values, the height is the mean of the y-value at the begin and at the end of the interval. 长度是两个x值之间的差值,高度是间隔开始和结束时y值的平均值。 I use the rollmeans function in the zoo package, but this can be done using the base package too. 我在zoo包中使用rollmeans函数,但这也可以使用基本包来完成。

require(zoo)

X <- rnorm(100)
# calculate the density and check the plot
Y <- density(X) # see ?density for parameters
plot(Y$x,Y$y, type="l") #can use ggplot for this too
# set an Avg.position value
Avg.pos <- 1

# construct lengths and heights
xt <- diff(Y$x[Y$x<Avg.pos])
yt <- rollmean(Y$y[Y$x<Avg.pos],2)
# This gives you the area
sum(xt*yt)

This gives you a good approximation up to 3 digits behind the decimal sign. 这为您提供了小数字后面最多3位数的良好近似值。 If you know the density function, take a look at ?integrate 如果您知道密度函数,请查看?integrate

Three possibilities: 三种可能性:

The logspline package provides a different method of estimating density curves, but it does include pnorm style functions for the result. logspline包提供了一种估算密度曲线的不同方法,但它确实包含结果的pnorm样式函数。

You could also approximate the area by feeding the x and y variables returned by the density function to the approxfun function and using the result with the integrate function. 您还可以通过将密度函数返回的x和y变量馈送到approxfun函数并将结果与​​积分函数一起使用来近似该区域。 Unless you are interested in precise estimates of small tail areas (or very small intervals) then this will probably give a reasonable approximation. 除非您对小尾区(或非常小的区间)的精确估计感兴趣,否则这可能会给出合理的近似值。

Density estimates are just sums of the kernels centered at the data, one such kernel is just the normal distribution. 密度估计只是以数据为中心的内核的总和,一个这样的内核只是正态分布。 You could average the areas from pnorm (or other kernels) with the sd defined by the bandwidth and centered at your data. 您可以将pnorm(或其他内核)中的区域与带宽定义的sd进行平均,并以数据为中心。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM