简体   繁体   中英

Posterior_survfit() is not using the nd

I am having trouble generating posterior predictions using posterior_survfit(). I am trying to use a new data frame, but it is not using the new data frame and instead is using values from the dataset I used to fit the model. The fitted variables in the model are New.Treatment (6 treatments = categorical), Openness (a continuous light index min= 2.22, mean= 6.903221 and max=10.54), subplot_by_site(categorical-720 sites), New.Species.name(categorical- 165 species). My new data frame has 94 rows and the posterior_survfit() is giving me 3017800 rows. Help, please!

head(nd)
      New.Treatment Openness
1          BE                  5
2          BE                  6
3          BE                  7
4          BE                  8
5          BE                  9
6          BE                 10


fit= stan_surv(formula = Surv(days, Status_surv) ~ New.Treatment*Openness + (1 |subplot_by_site)+(1|New.Species.name),
  data = dataset,
  basehaz = "weibull",
  chains=4,
  iter = 2000,
  cores =4 )

Post=posterior_survfit(fit, type="surv",
                        newdata=nd5)

head(Post)
  id cond_time    time median  ci_lb  ci_ub
1  1        NA 62.0000 0.9626 0.9623 1.0000
2  1        NA 69.1313 0.9603 0.9600 0.9997
3  1        NA 76.2626 0.9581 0.9579 0.9696
4  1        NA 83.3939 0.9561 0.9557 0.9665
5  1        NA 90.5253 0.9541 0.9537 0.9545
6  1        NA 97.6566 0.9522 0.9517 0.9526

##Here some reproducible code to explain my problem:

library(rstanarm)

data_NHN<- expand.grid(New.Treatment = c("A","B","C"), Openness = c(seq(2, 11, by=0.15)))
data_NHN$subplot_by_site=c(rep("P1",63),rep("P2",60),rep("P3",60))
data_NHN$Status_surv=sample(0:1,183, replace=TRUE) 
data_NHN$New.Species.name=c(rep("sp1",10),rep("sp2",40),rep("sp1",80),rep("sp2",20),rep("sp1",33))
data_NHN$days=sample(10, size = nrow(data_NHN), replace = TRUE)

nd_t<- expand.grid(New.Treatment = c("A","B","C"), Openness = c(seq(2, 11, by=1)))


mod= stan_surv(formula = Surv(days, Status_surv) ~ New.Treatment+Openness + (1 |subplot_by_site)+(1|New.Species.name),
                  data =data_NHN,
                  basehaz = "weibull",
                  chains=4,
                  iter = 30,
                  cores =4)

summary(mod)
pos=posterior_survfit(mod, type="surv",
                        newdataEvent=nd_t,
                      times = 0)
head(pos)

 #I am interested in predicting values for specific Openess values  
 #(nd_t=20 rows)but I am getting instead values for each point in time 
 #(pos=18300rows)

Operating System: Mac OS Catalina 10.15.6 R version: 4.0 rstan version: 2.21.2 rstanarm Version: rstanarm_2.21.2 Any suggestions on why is it not working. it's not clear how to give some sort of plot of the effects of one variable in the interaction as the other changes and the associated uncertainty (ie a marginal effects plot). In my example, I am interested in getting the values at specific "Openness" values and not at each specific time as appears in the posterior results. TIA.

I can only give you a partial answer. You should be aware that this is pretty bleeding-edge; although the ArXiv paper (dated Feb 2020) says

Hopefully by the time you are reading this, the functionality will be available in the stable release on the Comprehensive R Archive Network (CRAN)

but so far that's not even true; it's not even in the master GitHub branch, so I used remotes::install_github("stan-dev/rstanarm@feature/survival") to install it from source.

The proximal problem is that you should specify the new data frame as newdata , not newdataEvent . There seem to be a lot of mismatches between the master and this branch, and between the docs and the code... newdataEvent is used in the older method for stanjm models, but not for stansurv models. You can look at the code here , or use formals(rstanarm:::posterior_survfit.stansurv) . Unfortunately, because this method has an (unused, unchecked) ... argument, that means that any misnamed arguments will be silently ignored.

The next problem is that if you specify the new data in your example as newdata you'll get

Error: The following variables are missing from the data: subplot_by_site, New.Species.name

That is, there doesn't seem to be an obvious way to generate a population-level posterior prediction. (Setting the random effects grouping variables to NA isn't allowed.) If you want to do this, you could either:

  • expand your newdata to include all combinations of the grouping variables in your data set, and average the results across levels yourself;
  • post an issue on GitHub or contact the maintainers...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM