简体   繁体   中英

Specifying more complex random effects structure in cluster covariance tests (in model based recursive partitioning)? (e.g., MOB, glmertree, etc.)

How can I specify a more complex data structure than a simple ID column?

If I have a glmertree model, how can I specify (eg) a cross classified model in the cluster covariance tests?

tree_1 <- 
  glmertree(
    data = sim_dat, 
    formula = 
      performance ~ 1 + predictors | 
      (1 | student_id) + (1 | question_number) | 
      partitioning_variables, 
    family = 'binomial',
    cluster = ???
  )

Or how about in a simple nested design?

tree_2 <- 
  lmertree(
    data = sim_dat, 
    formula = 
      test_score ~ 1 + predictors | 
      (1 | district/school) | 
## equivalent to (1|school:district) + (1|district)
      partitioning_variables, 
    cluster = ???
  )

So far, I've fit models with cluster covariance tests on whatever level has the greatest variance in the outcome, but fitting the proper structure seems more appropriate if possible.

Thanks!

I hope I understand your question correctly; as per my comment to your question above, some more info might be helpful. This is a preliminary answer:

The cluster argument should be specified, so that the parameter stability tests will be performed at the right level. In most (but not all) cases, I would expect this to be only a single level, and thus only a single clustering variable needs to be passed to the cluster argument.

In tree_1 , if all partitioning variables are measured on the same level (ie, all are characteristics of either the students, or the questions), then you specify either cluster = question or cluster = student . If some partitioning variables are measured on the student level, and some partitioning variables are measured on the question level, it's going to be more complex, and we will have to look into that (I am one of the package authors).

In tree_2 , I assume that a single school can only be part of a single district. If all partitioning variables are measured on the district level, you specify cluster = district . If all partitioning variables are measured on the school level, then make sure that the school variable has a unique identifier for each school, and specify cluster = school . If a single school can be part of multiple districts, and partitioning variables are measured at both district and school level, then we will have to look into that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM