简体   繁体   English

如何从ks R包中估算kde对象95%轮廓的面积

[英]How to estimate the area of 95% contour of a kde object from ks R package

I'm trying to estimate the area of the 95% contour of a kde object from the ks package in R. 我正在尝试从R中的ks包估计kde对象的95%轮廓的面积。

If I use the example data set from the ks package, I would create the kernel object as follow: 如果我使用ks包中的示例数据集,则将创建内核对象,如下所示:

library(ks)
data(unicef)
H.scv <- Hscv(x=unicef)
fhat <- kde(x=unicef, H=H.scv)

I can easily plot the 25, 50, 75% contour using the plot function: 我可以使用plot函数轻松绘制25%,50%,75%的轮廓:

plot(fhat)

But I want to estimate the area within the contour. 但是我想估计轮廓内的面积。

I saw a similar question here , but the answer proposed does not solve the problem. 我在这里看到了类似的问题,但是提出的答案并不能解决问题。

In my real application, my dataset is a time series of coordinates of an animal and I want to measure the home range size of this animal using a bivariate normal kernel. 在我的实际应用中,我的数据集是动物的时间序列,我想使用二元正态核来测量该动物的家园范围。 I'm using ks package because it allows to estimate the bandwith of a kernel distribution with methods such as plug-in and smoothed cross-validation. 我使用ks软件包是因为它允许使用诸如插件和平滑交叉验证之类的方法来估计内核分发的带宽。

Any help would be really appreciated! 任何帮助将非常感激!

Here are two ways to do it. 这有两种方法。 They are both fairly complex conceptually, but actually very simple in code. 它们在概念上都相当复杂,但实际上在代码上却非常简单。

fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
                                     z=estimate,levels=cont["95%"])[[1]])
library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677

library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,])    # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677

Explanation 说明

First, the kde(...) function returns a kde object, which is a list with 9 elements. 首先, kde(...)函数返回一个kde对象,该对象是一个包含9个元素的列表。 You can read about this in the documentation, or you can type str(fhat) at the command line, or, if you're using RStudio (highly recommended), you can see this by expanding the fhat object in the Environment tab. 您可以在文档中阅读有关此内容的信息,也可以在命令行中键入str(fhat) ,或者,如果使用的是RStudio(强烈建议使用),则可以通过在“环境”选项卡中fhat对象来查看此信息。

One of the elements is $eval.points , the points at which the kernel density estimates are evaluated. 元素之一是$eval.points ,这是评估内核密度估计的点。 The default is to evaluate at 151 equally spaced points. 默认值为151个等距点。 $eval.points is itself a list of, in your case 2 vectors. $eval.points本身是2个向量的列表。 So, fhat$eval.points[[1]] represents the points along "Under-5" and fhat$eval.points[[2]] represents the points along "Ave life exp". 因此, fhat$eval.points[[1]]代表沿“ 5岁以下”的点,而fhat$eval.points[[2]]代表沿“ Ave life exp”的点。

Another element is $estimate , which has the z-values for the kernel density, evaluated at every combination of x and y. 另一个元素是$estimate ,它具有内核密度的z值,在x和y的每种组合下求值。 So $estimate is a 151 X 151 matrix. 因此$estimate是151 X 151矩阵。

If you call kde(...) with compute.cont=TRUE , you get an additional element in the result: $cont , which contains the z-value in $estimate corresponding to every percentile from 1% to 99%. 如果使用compute.cont=TRUE调用kde(...)compute.cont=TRUE在结果中获得一个附加元素: $cont ,其中包含$estimate中的z值,对应于从1%到99%的每个百分比。

So, you need to extract the x- and y-values corresponding to the 95% contour, and use that to calculate the area. 因此,您需要提取与95%轮廓对应的x和y值,然后使用该值计算面积。 You would do that as follows: 您将按照以下步骤进行操作:

fhat <- kde(x=unicef, H=H.scv,compute.cont=TRUE)    
contour.95 <- with(fhat,contourLines(x=eval.points[[1]],y=eval.points[[2]],
                                     z=estimate,levels=cont["95%"])[[1]])

Now, contour.95 has the x- and y-values corresponding to the 95% contour of fhat . 现在, contour.95的x和y值对应于fhat的95%轮廓。 There are (at least) two ways to get the area. 有(至少)两种方法可以到达该区域。 One uses the pracma package and calculates it directly. 一个使用pracma软件包并直接进行计算。

library(pracma)
with(contour.95,polyarea(x,y))
# [1] -113.677

The reason for the negative value has to do with the ordering of x and y: polyarea(...) is interpreting the polygon as a "hole", so it has negative area. 负值的原因与x和y的顺序有关: polyarea(...)将多边形解释为“孔”,因此它具有负面积。

An alternative uses the area calculation routines in rgeos (a GIS package). 另一种方法是使用rgeos (一个GIS软件包)中的面积计算例程。 Unfortunately, this requires you to first turn your coordinates into a "SpatialPolygon" object, which is a bit of a bear. 不幸的是,这要求您首先将坐标转换为“ SpatialPolygon”对象,这有点让人难以忍受。 Nevertheless, it is also straightforward. 尽管如此,它也很简单。

library(sp)
library(rgeos)
poly <- with(contour.95,data.frame(x,y))
poly <- rbind(poly,poly[1,])    # polygon needs to be closed...
spPoly <- SpatialPolygons(list(Polygons(list(Polygon(poly)),ID=1)))
gArea(spPoly)
# [1] 113.677

Another method would be to use the contourSizes() function within the kde package. 另一种方法是在kde包中使用contourSizes()函数。 I've also been interested in using this package to compare both 2D and 3D space use in ecology, but I wasn't sure how to extract the 2D density estimates. 我还对使用此程序包比较生态学中2D和3D空间的使用感兴趣,但是我不确定如何提取2D密度估计值。 I tested this method by estimating the area of an "animal" which was limited to the area of a circle with a known radius. 我通过估计“动物”的面积来测试该方法,“动物”的面积仅限于已知半径的圆的面积。 Below is the code: 下面是代码:

set.seed(123)
require(GEOmap)
require(kde)
# need this library for the inpoly function

# Create a data frame centered at  coordinates 0,0
data = data.frame(x=0,y=0)

# Create a vector of radians from 0 to 2*pi for making a circle to
# test the area
circle = seq(0,2*pi,length=100) 

# Select a radius for your circle
radius = 10
# Create a buffer for when you simulate points (this will be more clear below)
buffer = radius+2

# Simulate x and y coordinates from uniform distribution and combine
# values into a dataframe

createPointsX = runif(1000,min = data$x-buffer, max = data$x+buffer)
createPointsY = runif(1000,min = data$y-buffer, max = data$y+buffer)
data1 = data.frame(x=createPointsX,y=createPointsY)

# Plot the raw data
plot(data1$x,data1$y)

# Calculate the coordinates used to create a cirle with center 0,0 and
# with radius specified above
coords = as.data.frame(t(rbind(data$x+sin(circle)*radius,
                           data$y+cos(circle)*radius)))
names(coords) = c("x","y")

# Add circle to plot with red line
lines(coords$x,coords$y,col=2,lwd=2)

# Use the inpoly function to calculate whether points lie within
# the circle or not.
inp = inpoly(data1$x, data1$y, coords)
data1 = data1[inp == 1,]

# Finally add points that lie with the circle as blue filled dots
points(data1$x,data1$y,pch=19,col="blue")

# Radius of the circle (known area)
pi * radius^2
#[1] 314.1593


# Sub in your own data here to calculate 95% homerange or 50% core area usage
H.pi = Hpi(data1,binned=T)
fhat = kde(data1,H=H.pi)
ct1 = contourSizes(fhat, cont = 95, approx=TRUE)

# Compare the known area of the circle to the 95% contour size
ct1
#     5% 
# 291.466 

I've also tried creating 2 un-connected circles and testing the contourSizes() function and it seems to work really well on disjointed distributions. 我也尝试过创建2个未连接的圆并测试contourSizes()函数,它似乎在不连续的分布上非常有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM