简体   繁体   English

嵌套data.frame内的条件变异map lm

[英]Conditional mutate map lm within a nested data.frame

I am trying to run a stratified analysis fitting a slightly different model in a particular strata using a nested data.frame and map mutate approach:我正在尝试使用嵌套 data.frame 和 map 变异方法在特定层中运行分层分析,以拟合稍微不同的 model:

cars_nest <- mtcars %>%
  group_by(cyl) %>%
  nest()

model1 <- function(df) {
  lm(mpg ~ disp + wt, data = df)
}

model2 <- function(df) {
  lm(mpg ~ disp + wt + factor(vs), data = df)
}

cars_nest %>% 
  mutate(
    model = case_when(
      cyl == 8 ~ map(data, model1),
      cyl %in% c(4, 6) ~ map(data, model2)
    )
  )

I'm getting the error我收到错误

Error: Problem with mutate() input model .错误: mutate()输入model有问题。 x contrasts can be applied only to factors with 2 or more levels ℹ Input model is case_when(...) . x 对比只能应用于具有 2 个或更多级别的因子ℹ 输入modelcase_when(...) ℹ The error occurred in group 3: cyl = 8. ℹ 错误发生在第 3 组:cyl = 8。

I assume this is due to the vectorized operation of case_when as the below does seem to work.我认为这是由于 case_when 的矢量化操作,因为下面似乎确实有效。

cars_nest %>% 
  mutate(
    model = ifelse(cyl == 8, map(data, model1),
                   ifelse(cyl %in% c(4, 6), map(data, model2), NA)
    )
  )

Is there a way to make this work using case_when()?有没有办法使用 case_when() 来完成这项工作?

NOTE: the factor in model2 is necessary to replicate the issue.注意:模型 2 中的因子是复制问题所必需的。 in the actual model, this is a factor that has only one level in the first strata and more than one level in the second strata.在实际的model中,这是一个在第一层只有一个层次,在第二层有一个层次以上的因子。

We could use if/else我们可以使用if/else

library(dplyr)
mtcars %>%
   nest_by(cyl) %>% 
   mutate(model = if(cur_group() == 8) list(model1(data)) else 
              list(model2(data))) %>% 
   ungroup

-output -输出

# A tibble: 3 x 3
#    cyl           data model   
#  <dbl> <list<tibble>> <list>
#1     4      [11 × 10] <lm>  
#2     6       [7 × 10] <lm>  
#3     8      [14 × 10] <lm>  

Or using the OP's code或使用 OP 的代码

library(purrr)
cars_nest %>% 
  mutate(
    model = 
      if(cur_group() == 8) map(data, model1)
      else map(data, model2)
    )
# A tibble: 3 x 3
# Groups:   cyl [3]
#    cyl data               model 
#  <dbl> <list>             <list>
#1     6 <tibble [7 × 10]>  <lm>  
#2     4 <tibble [11 × 10]> <lm>  
#3     8 <tibble [14 × 10]> <lm>  

You can also use the following solution, however there is not much difference between all these:您也可以使用以下解决方案,但是所有这些之间没有太大区别:

library(dplyr)
library(purrr)
library(broom)

cars_nest %>%
  mutate(model = ifelse(cyl %in% c(4, 6), map(data, ~ model2(.)), 
                        map(data, ~ model1(.))), 
         glance = map(model, ~ glance(.x))) %>%
  unnest(glance) %>%
  select(-data)

# A tibble: 3 x 14
# Groups:   cyl [3]
    cyl model r.squared adj.r.squared sigma statistic p.value    df logLik   AIC   BIC deviance
  <dbl> <lis>     <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>
1     6 <lm>      0.702         0.403  1.12      2.35  0.251      3  -7.78  25.6  25.3     3.78
2     4 <lm>      0.659         0.513  3.15      4.51  0.0461     3 -25.7   61.5  63.5    69.3 
3     8 <lm>      0.425         0.320  2.11      4.06  0.0477     2 -28.6   65.3  67.8    49.0 
# ... with 2 more variables: df.residual <int>, nobs <int>

Just remove factor from model2 :只需从model2中删除factor

model2 <- function(df) {
  lm(mpg ~ disp + wt + vs, data = df)
}

Then然后

cars_nest %>% 
  mutate(
    model = case_when(
      cyl == 8 ~ map(data, model1),
      cyl %in% c(4, 6) ~ map(data, model2)
    )
  ) %>% 
  print

results in结果是

# A tibble: 3 x 3
# Groups:   cyl [3]
    cyl data                    model 
  <dbl> <list>                  <list>
1     6 <tibble[,10] [7 x 10]>  <lm>  
2     4 <tibble[,10] [11 x 10]> <lm>  
3     8 <tibble[,10] [14 x 10]> <lm> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM