简体   繁体   English

在spatstat中使用“包络”功能对空间点模式超帧进行基于仿真的假设检验

[英]Simulation-based hypothesis testing on spatial point pattern hyperframes using “envelope” function in spatstat

I want to use replicated spatial point patterns to perform hypothesis tests in spatstat . 我想使用复制的空间点模式在spatstat执行假设检验。 spatstat has wonderful documentation, and you can find details about replicated point pattern analysis here: https://cran.r-project.org/web/packages/spatstat/vignettes/replicated.pdf spatstat文档很好,您可以在此处找到有关复制点模式分析的详细信息: https : spatstat

Several mapped forest plots will represent a point process working at different locations. 几个映射的森林图将代表在不同位置工作的点过程。 Each point pattern is marked, where each mark represents a tree species. 每个点模式都被标记,每个标记代表一个树种。 Additionally, each point pattern is associated with a covariate raster image. 此外,每个点模式都与协变量栅格图像相关联。 First, I want to create a null model which assumes that each mark has a different relationship with the covariate. 首先,我想创建一个空模型,该模型假定每个标记与协变量具有不同的关系。 Then I want to use this null model to test whether certain species are associated with (or avoid) one another by simulating the random labeling hypothesis and plotting the resulting simulation envelopes. 然后,我想使用这种空模型,通过模拟随机标记假设并绘制结果模拟包络来测试某些物种是否彼此关联(或避免)。 (The random labeling hypothesis states that the marks are randomly assigned to points in the point pattern.) (随机标记假设指出,标记是随机分配给点图案中的点的。)

First, I'll show how I would normally do this analysis using a single point pattern. 首先,我将展示通常如何使用单点模式进行此分析。 Then I'll explain the two problems I'm having using hyperframes to perform the same analysis. 然后,我将解释使用超帧执行相同分析时遇到的两个问题。

Let's say you have a marked point pattern, where each mark is the tree species. 假设您有一个标记的点模式,其中每个标记都是树种。 I'll use the "lansing" data set available in spatstat: 我将使用spatstat中可用的“ lansing”数据集:

library(spatstat)
data(lansing)
par(mar=rep(0.5,4))
plot(split(lansing),main="")

Now let's say you want to look at some spatial covariate (eg, soil nutrients or moisture), so you create a raster image of a kernel smoothed density of the measurement: 现在,假设您要查看一些空间协变量(例如,土壤养分或水分),因此您要创建一个内核平滑测量密度的栅格图像:

sim1 <- rpoispp(function(x,y) {500 * exp(-3*x)}, win=owin(c(0,1),c(0,1)))
sim1 <- density(sim1)

First create null model: 首先创建空模型:

single.mod <- ppm(lansing ~ marks*sim1)

Where the "ppm" function recognizes "marks" as the column with the species names, and "sim1" as the covariate. 其中“ ppm”功能将“标记”识别为具有物种名称的列,将“ sim1”识别为协变量。

Then perform a simulation-based hypothesis test, where you're interested in whether Black Oak and Maple are found in the same locations. 然后执行基于模拟的假设检验,您将对是否在相同位置发现黑橡树和枫树感兴趣。

single.E<-envelope.ppm(single.mod,Lcross,i="blackoak",j="maple",nsim=39, nrank=1,global=TRUE,correction="best",simulate=expression(rlabel(lansing)))
par(mar=rep(4,4))
plot(E,legend=FALSE,ylab="L-function",xlab="Spatial scale (m)",main="Testing random label hypothesis \nfor single point pattern")

This works fine. 这很好。 Now if we go out and sample a couple more plots to make our analysis more robust, we can incorporate additional plots into a hyperframe, where each plot has its own point pattern and are considered "experimental" replicates. 现在,如果我们出去采样更多的图以使我们的分析更加稳健,我们可以将其他图合并到超帧中,其中每个图都有其自己的点模式,并被视为“实验”重复。 Each plot will get its own spatial covariate as well: 每个图也将获得其自己的空间协变量:

sim2 <- rpoispp(function(x,y) {500 * exp(-2*y)}, win=owin(c(0,1),c(0,1)))
sim2 <- density(sim2)

sim3 <- rpoispp(500, win=owin(c(0,1),c(0,1)))
sim3 <- density(sim3)

hyper <- hyperframe(pp=list(lansing,lansing,lansing),sims=list(sim1,sim2,sim3))

Sim2 and sim3 are spatial covariates for the additional two plots we've collected, and the "hyperframe" function combines the three point patterns with their associated spatial covariates into one hyperframe. Sim2和sim3是我们收集的另外两个图的空间协变量,“超帧”功能将三个点模式及其关联的空间协变量组合为一个超帧。

I'd like to build a model using "mppm" (used for creating models for multiple point patterns), where each point pattern is explained by their spatial covariates, "sims": 我想使用“ mppm”(用于为多个点模式创建模型)构建模型,其中每个点模式都通过其空间协变量“ sims”进行解释:

hyper.mod <- mppm(pp ~ sims, data = hyper)

The first problem arises when I try to allow each mark to carry a different relationship with the covariate: 当我尝试允许每个标记与协变量具有不同的关系时,就会出现第一个问题:

int.mod <- mppm(pp ~ marks*sims, data=hyper)

The following error message spits out: 出现以下错误消息:

Error in checkvars(formula, data.sumry$col.names, extra = c("x", "y", : Variable "marks" in formula is not one of the names in data" checkvars中的错误(公式,data.sumry $ col.names,extra = c(“ x”,“ y” 、:公式中的变量“标记”不是数据中的名称之一“

I get the same error using: 我得到相同的错误使用:

int.mod <- mppm(pp ~ pp$marks*sims, data=hyper)

The second problem is figuring out how to run the simulation-based hypothesis test on the hyperframe. 第二个问题是弄清楚如何在超帧上运行基于仿真的假设检验。 Let's use the hyperframe model that worked (hyper.mod) to try this: 让我们使用有效的超帧模型(hyper.mod)尝试以下操作:

E.hyper <- envelope(hyper.mod,Lcross,i="blackoak",j="maple",nsim=39, nrank=1,global=TRUE,correction="best",simulate=expression(rlabel(pp)))

You get an error message: 您收到一条错误消息:

Error in UseMethod("envelope") : no applicable method for 'envelope' applied to an object of class "c('mppm', 'list')" UseMethod(“ envelope”)中的错误:没有适用于“ envelope”的适用方法应用于类“ c('mppm','list')”的对象

Implying that "envelope" doesn't work on mppm objects (only ppp or ppm). 暗示“信封”不适用于mppm对象(仅ppp或ppm)。 I suspect there's a way to get around this limitation cleverly, but I haven't found it yet. 我怀疑有一种方法可以巧妙地解决此限制,但是我还没有找到它。 Any suggestions or guidance would be very helpful! 任何建议或指导将非常有帮助!

The first error is a bug that was fixed a few days ago in the development version of spatstat . 第一个错误是几天前在spatstat的开发版本中spatstat If you have the devtools package you can get the latest (and greatest) spatstat easily: 如果您拥有devtools软件包,则可以轻松获得最新(也是最spatstatspatstat

devtools::install_github('spatstat/spatstat')

Let me know if this is not an option for you (and also which platform you are on). 让我知道这是否不是您的选择(以及您使用的平台)。

The second error is indeed because envelope isn't implemented for the class mppm , so you have to devise a workaround for now as you say. 第二个错误确实是因为没有为mppm类实现envelope ,所以您必须像现在所说的那样设计一种解决方法。

I think there are a couple of issues with what you have done so far: The model is inhomogeneous, but you use Lcross rather than Lcross.inhom and I think your single.E is equivalent to (where you don't use the fitted model at all): 我认为您到目前为止所做的事情有两个问题:模型是不均匀的,但是您使用Lcross而不是Lcross.inhom并且我认为您的single.E等效于(其中您不使用拟合模型完全没有):

single.E <- envelope(lansing, Lcross, i="blackoak", j="maple", nsim=39, rank=1, global=TRUE, correction="best", simulate=expression(rlabel(lansing)))

Let me know how you progress with this. 让我知道您的进度。 I might find time to give some more details on a workaround for the missing envelope.mppm (pooling summary functions for each pattern). 我可能会花时间为丢失的envelope.mppm (每种模式的汇总摘要功能)提供一些解决方法的更多详细信息。

There is no envelope.mppm in spatstat because some of the statistical issues (related to multiple hypothesis testing) have yet not been resolved. 由于某些统计问题(与多重假设检验有关)尚未解决,因此spatstat没有envelope.mppm

The fastest solution is probably to use cdf.test.mppm which will perform a test and give some graphical output. 最快的解决方案可能是使用cdf.test.mppm ,它将执行测试并提供一些图形输出。

It would only be reasonable to pool the envelopes (of the K functions, say) from different point patterns to obtain a single envelope provided the fitted model implies that the different patterns should be statistically equivalent. 它只会是合理的汇集从不同的角度模式的信封(的第k功能,说)获得一个信封提供的拟合模型意味着不同的模式应该是统计学等同。 That would not be valid if, eg, the model includes covariates which are different for different patterns. 如果例如模型包含针对不同模式不同的协变量,那将是无效的。

A better strategy is probably to plot global envelopes for each pattern, and use the product rule for multiple testing. 更好的策略可能是为每个模式绘制全局包络,并使用乘积规则进行多次测试。 Suppose there are M point patterns in the data, and you want a test (of the fitted model) of significance level alpha (usually alpha = 0.05). 假设数据中有M个点模式,并且您想要对显着性水平alpha(通常为alpha = 0.05)进行(拟合模型的)检验。 Then you're going to construct M envelopes, one for each pattern, each with significance level gamma = 1 - (1-alpha)^(1/M) . 然后,您将构造M个信封,每个信封一个,重要性等级为gamma = 1 - (1-alpha)^(1/M) Each envelope will be generated by envelope.ppp with global=TRUE and nsim = 1/gamma - 1 approximately. 每个信封将由envelope.ppp生成,其中global=TRUEnsim = 1/gamma - 1左右。 Example: if M = 10 and alpha = 0.05 then gamma=1 - 0.95^(1/10) = 0.0051 so nsim=1/gamma-1 = 194.45 , call it nsim=195 . 示例:如果M = 10并且alpha = 0.05,则gamma=1 - 0.95^(1/10) = 0.0051 nsim=1/gamma-1 = 194.45 gamma=1 - 0.95^(1/10) = 0.0051因此nsim=1/gamma-1 = 194.45 ,将其nsim=195 Since you're going to make global envelopes you may need twice that number of simulations, as explained the help for envelope . 由于您要制作全局信封,因此您可能需要两倍数量的模拟,如envelope的帮助所述。 So do as follows, where fit is the fitted model and H is the original hyperframe of data: 因此,请执行以下操作,其中fit是拟合模型, H是数据的原始超帧:

sims <- simulate(fit, nsim=2*195)
SIMS <- list()
for(i in 1:nrow(sims)) SIMS[[i]] <- as.solist(sims[i,,drop=TRUE])
Hplus <- cbind(H, hyperframe(Sims=SIMS))

This augments the original hyperframe by adding a column of simulated patterns (each entry in the column is a 'solist' containing 2*195 patterns). 通过添加一列模拟图案来扩大原始超帧(该列中的每个条目都是一个包含2 * 195个图案的“ solist”)。 Then do (where X is the column of H containing the original point pattern datasets) 然后做(其中XH的列,其中包含原始点模式数据集)

EE <- with(Hplus, envelope(X, Lest, global=TRUE, nsim=195, simulate=Sims))
plot(EE)

This produces a plot with many panels of envelopes. 这将产生具有许多信封面板的图。 Their interpretation is that, if any one of the observed L functions wanders outside the corresponding envelopes, the result is significant - the test rejects the null hypothesis that the fitted model is true. 他们的解释是,如果观察到的L个函数中的任何一个在相应的包络线之外徘徊,则结果是有意义的-该测试将拒绝拟合模型为真的零假设。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM