简体   繁体   English

来自 disaggregation R 包的 disag_model() 函数的问题

[英]issue with disag_model() function from disaggregation R package

I was trying to use the disaggregation package to evaluate if it could be used on the dataset I have.我试图使用分解包来评估它是否可以用于我拥有的数据集。 My original data are disaggregated, so I've aggregated them to use the disag_model function from disaggregation package and compare "fitted values" with actual values.我的原始数据是分解的,所以我聚合了它们以使用分解包中的 disag_model 函数并将“拟合值”与实际值进行比较。 However when I run the function the R session aborts.但是,当我运行该函数时,R 会话中止。 I tried to execute the disag_model function step by step and I saw that the problem is due to the use of nlminb() to optimize the a posteriori density function, but I cannot understand why it's happening and how to solve it.我尝试逐步执行 disag_model 函数,我看到问题是由于使用 nlminb() 来优化后验密度函数,但我无法理解为什么会发生这种情况以及如何解决它。 Thanks for your help.谢谢你的帮助。

You can find the data I used at this link: https://www.dropbox.com/sh/au7l0e11trzfo19/AACpfRSUpd4gRCveUsh5JX6Ea?dl=0 Please download the folder to run the code.您可以在此链接中找到我使用的数据: https : //www.dropbox.com/sh/au7l0e11trzfo19/AACpfRSUpd4gRCveUsh5JX6Ea?dl=0请下载文件夹以运行代码。 This is the code I used:这是我使用的代码:

library(tidyverse)
library(raster)
library(disaggregation)
library(sp)

path<- "yourPath/Data"

load(file.path(path, "myRS"))
load(file.path(path, "RAST"))


Data <- read.csv(file = paste(path, "/sim_data.csv", sep = ""))
Data$HasRes <- ifelse(Data$PN50 > runif(nrow(Data)), 1, 0)
for (i in 1:nlayers(myRS)) {
  myRS@layers[[i]]@file@name<-file.path(path, "predStackl10")
}
DFCov <-
  as.data.frame(raster::extract(myRS, Data[c("XCoord", "YCoord")]))
Data <- cbind(Data, DFCov)

# Remove NA
NAs <- which(is.na(rowSums(Data[names(myRS)])))
Data <- Data[-NAs, ]
Data$ISO3 <- as.factor(Data$ISO3)

world_shape <-
  shapefile(file.path(path, "World.shp"))
lmic_shape <-
  world_shape[(world_shape@data$ISO3 %in% levels(Data$ISO3)),]
plot(lmic_shape)


# I would like to convert Data in a SpatialPointsDataFrame object
PN50 <- Data
coordinates(PN50) <- c("XCoord", "YCoord")
is.projected(PN50) # see if a projection is defined
proj4string(PN50) <- CRS("+proj=longlat +datum=WGS84")



# compute the mean P50 within each state
PN50_mean <- aggregate(x = PN50,
                       by = list(Data$ISO3),
                       FUN = mean)
# compute the centroid of the observations coordinates for each state
PN50_centroid <-
  Data %>% group_by(ISO3) %>% summarise(meanX = mean(XCoord), meanY = mean(YCoord))

# assign to each mean the centroid coordinates
PN50_agg <-
  as.data.frame(
    cbind(
      PN50_mean = PN50_mean@data$PN50,
      XCoord = PN50_centroid$meanX,
      YCoord = PN50_centroid$meanY
    )
  )
PN50_agg$XCoord <- as.numeric(PN50_agg$XCoord)
PN50_agg$YCoord <- as.numeric(PN50_agg$YCoord)
PN50_agg$ISO3 <- as.character(PN50_centroid$ISO3)
samsiz <-
  Data %>% group_by(ISO3) %>% summarise(sz = sum(SampleSize))
PN50_agg$sample_size <- as.numeric(samsiz$sz)
PN50_agg$case <- round(PN50_agg$PN50_mean * PN50_agg$sample_size)

# I would like having data in a SpatialPolygonsDataFrame format to use the disaggrgation package
library(sp)
coordinates(PN50_agg) <- c("XCoord", "YCoord")
proj4string(PN50_agg) <- CRS("+proj=longlat +datum=WGS84")
PN50_polyg <- lmic_shape
PN50_polyg@data <-
   full_join(PN50_polyg@data, PN50_agg@data, by = "ISO3")


# covariates raster 

covariate_stack <-
  getCovariateRasters(path, shape = raster(x = paste0(path, '/multi.tif')))
names(covariate_stack)
covariate_stack2 <- dropLayer(covariate_stack, nlayers(covariate_stack))
names(covariate_stack2)
plot(covariate_stack2)
covariate_stack2 <- raster::stack(covariate_stack2)
covariate_stack2<-brick(covariate_stack2)

# population raster

extracted <- raster::extract(raster(x = paste0(path, '/multi.tif')), PN50_polyg)
n_cells <- sapply(extracted, length)
PN50_polyg@data$pop_per_cell <- PN50_polyg@data$sample_size / n_cells

population_raster <-
  rasterize(PN50_polyg, covariate_stack2, field = 'pop_per_cell')

# prepare data for disag_model()

dis_data <- prepare_data(
  polygon_shapefile = PN50_polyg,
  covariate_rasters = covariate_stack2,
  aggregation_raster = population_raster,
  mesh.args = list(
    max.edge = c(5, 40),
    cut = 0.0005,
    offset = 1
  ),
  id_var = "ISO3",
  response_var = "case",
  sample_size_var = "sample_size",
  na.action = TRUE,
  ncores = 8
)


# Rho and p(Rho<Rho_min)
dist <- pointDistance(PN50_agg@coords, lonlat = F, allpairs = T)
rownames(dist) <- PN50_agg$ISO3
colnames(dist) <- PN50_agg$ISO3

flattenDist <- function(dist) {
  up <- upper.tri(dist)
  flat <- data_frame(row = rownames(dist)[row(dist)[up]],
                     column = rownames(dist)[col(dist)[up]],
                     dist = dist[up])
  return(flat)
}
pair_dist <- flattenDist(dist)
d <- pair_dist$dist
k <- 0.036
CorMatern <- k * d * besselK(k * d, 1)
limits <- sp::bbox(PN50_polyg)
hypontenuse <-
  sqrt((limits[1, 2] - limits[1, 1]) ^ 2 + (limits[2, 2] - limits[2, 1]) ^
         2)
prior_rho <- hypontenuse / 3
p_rho <- sum(d[CorMatern <= 0.1] < prior_rho) / length(d[CorMatern <= 0.1])

# sigma and p(sigma>sigma_max)
sigma_boost <- function(data, i) {
  sd(data[i] / mean(data[i]))
}
sigma <-
  boot(data = dis_data$polygon_data$response,
       statistic = sigma_boost,
       10000)

prior_sigma <- sigma$t0
p_sigma <- sum(sigma$t >= sigma$t0) / length(sigma$t)

default_priors <-
  list(
    priormean_intercept = 0,
    priorsd_intercept = 4,
    priormean_slope = 0,
    priorsd_slope = 2,
    prior_rho_min = prior_rho,
    prior_rho_prob = p_rho,
    prior_sigma_max = prior_sigma,
    prior_sigma_prob = p_sigma,
    prior_iideffect_sd_max = 0.1,
    prior_iideffect_sd_prob = 0.01
  )

fitted_model <- disag_model(
  data = dis_data,
  iterations = 1000,
  family = "binomial",
  link = "logit",
  # priors = default_priors,
  field = TRUE,
  iid = TRUE,
  silent = TRUE
)

I was able to run the disag_model function using your dis_data object.我能够使用您的 dis_data 对象运行 disag_model 函数。 There were no errors or crashes.没有错误或崩溃。 I ran the following lines.我运行了以下几行。

fitted_model <- disag_model(
  data = dis_data,
  iterations = 1000,
  family = "binomial",
  link = "logit",
  field = TRUE,
  iid = TRUE,
  silent = TRUE
)

I am running on a Windows machine with 64GB RAM and 8 cores.我在具有 64GB RAM 和 8 个内核的 Windows 机器上运行。 It took over an hour and used all of my RAM for a while and up to 50% of my CPU, which is not surprising as you are fitting 5.5M pixels over the whole world.花了一个多小时,用了我所有的内存一段时间,CPU 的使用率高达 50%,这并不奇怪,因为您在整个世界中拟合了 550 万个像素。 Therefore, I suspect it is related to your computer running out of resources.因此,我怀疑这与您的计算机资源不足有关。 I suggest you try a smaller example to test it out first.我建议你先尝试一个较小的例子来测试它。 Try fewer polygons and fewer pixels in each polygon.尝试在每个多边形中使用更少的多边形和更少的像素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM