使用从 R 的 tidyverse 'map' 的 output 中提取的 lm 使用 'segmented' 时出错

Question

我正在使用“分段”package 在 R 的线性回归中找到断点

library(tidyverse)
library(segmented)

df <- data.frame(x = c(1:10), y = c(1,1,1,1,1,6:10))
lm_model <- lm(y ~ x, data = df)
seg_model <- segmented(obj = lm_model, seg.Z = ~ x)

但是，如果我在 purrr:map 中运行相同的 model，则分段失败。

map_test <- df %>% 
  nest() %>%
  mutate(map_lm = map(data, ~lm(y ~ x, data = .)),
         param_map_lm = map(map_lm, tidy))

map_lm_model <- map_test[[2]][[1]]

map_seg_model <- segmented(obj = map_lm_model, seg.Z = ~ x)

“is.data.frame（数据）中的错误：object '。' 未找到”

从map output中提取的lm中取出lm obj时，segmented找不到底层数据。

然而，两个线性 model 对象看起来是相同的。

我真正需要做的是一个更有用的 map 在 dataframe 的多个子集上运行 lm，然后在生成的 lm 上运行“分段”。

Answer 1

这与 glm() 和 purrr::map() 之间的交互基本相同。

lm()捕获提供给它的表达式，它作为一个独立的案例工作得很好。 但是，当被map()调用时，提供的表达式是. ，在该map()调用的直接上下文之外没有任何意义，并导致您观察到的错误。

与另一个问题一样，一种解决方法是为lm()定义一个包装器，该包装器直接在数据集上组成自定义调用，然后lm()将其捕获为未评估的表达式。

# Composes a custom lm() expression and evaluates it
lm2 <- function(data, ...)
    eval( rlang::expr(lm(data=!!rlang::enexpr(data), !!!list(...))) )

# Now mapping using lm2, instead of lm
map_test <- nest(df, data=everything()) %>% 
    mutate(map_lm       = map(data, lm2, y ~ x),
           param_map_lm = map(map_lm, broom::tidy))

# The data is stored directly inside the lm object
# segmented() now has no problems accessing it
map_lm_model <- map_test[[2]][[1]]
map_seg_model <- segmented(obj = map_lm_model, seg.Z = ~ x)
# Call: segmented.lm(obj = map_lm_model, seg.Z = ~x)
# 
# Meaningful coefficients of the linear terms:
# (Intercept)            x         U1.x  
#   1.000e+00    6.344e-15    1.607e+00  
# 
# Estimated Break-Point(s):
# psi1.x  
#  3.889

或作为单个mutate()链：

map_test <- nest(df, data=everything()) %>% 
    mutate(map_lm       = map(data, lm2, y ~ x),
           param_map_lm = map(map_lm, broom::tidy),
           seg_lm       = map(map_lm, segmented, seg.Z=~x))
# # A tibble: 1 x 4
#             data map_lm param_map_lm     seg_lm    
#   <list<df[,2]>> <list> <list>           <list>    
# 1       [10 × 2] <lm>   <tibble [2 × 5]> <segmentd>

Answer 2

您可以将 model 数据与 model object 一起存储，并替换. 在 model 对象的数据name中按存储的数据name ：

map_test <- df %>% 
  nest(data=everything()) %>%
  mutate(map_lm = map(data, ~lm(y ~ x, data = . )),
         data_map_lm = data, # Store model data
         param_map_lm = map(map_lm, broom::tidy))

map_lm_model <- map_test$map_lm[[1]]
map_lm_data <- map_test$data_map_lm[[1]]

# Set stored data name in model
map_lm_model$call$data <- as.name("map_lm_data")

map_seg_model <- segmented(obj = map_lm_model, seg.Z = ~ x)
map_seg_model

# Call: segmented.lm(obj = map_lm_model, seg.Z = ~x)
# 
# Meaningful coefficients of the linear terms:
#   (Intercept)            x         U1.x  
# 1.000e+00   -1.874e-08    1.607e+00  
# 
# Estimated Break-Point(s):
#   psi1.x  
# 3.889

在具有更多级别的nest上：

df <- data.frame(x = c(c(1:10),c(11:20)),
                 y = rep(c(1,1,1,1,1,6:10),2),
                 level=c(rep('a',10),rep('b',10)))


map_test <- df %>%
  nest(data=c(x,y)) %>%
  mutate(map_lm = map(data, ~lm(y ~ x, data = . )),
         data_map_lm = data,
         param_map_lm = map(map_lm, broom::tidy))


map_test %>% pmap(~with(list(...),{
  map_lm$call$data <- as.name("data_map_lm")
  segmented(obj = map_lm, seg.Z = ~ x)
}))

#> [[1]]
#> Call: segmented.lm(obj = map_lm_model, seg.Z = ~x)
#> 
#> Meaningful coefficients of the linear terms:
#> (Intercept)            x         U1.x  
#>   1.000e+00   -1.874e-08    1.607e+00  
#> 
#> Estimated Break-Point(s):
#> psi1.x  
#>  3.889  
#> 
#> [[2]]
#> Call: segmented.lm(obj = map_lm_model, seg.Z = ~x)
#> 
#> Meaningful coefficients of the linear terms:
#> (Intercept)            x         U1.x  
#>   1.000e+00   -1.937e-08    1.607e+00  
#> 
#> Estimated Break-Point(s):
#> psi1.x  
#>  13.89

Answer 3

一个选项可以是使用purrr中的map2来立即在segmented model 上使用“map_lm”model。 在这种情况下，使用.y以便数据保持与使用lm model 的名称相同的名称。 这是一个可重现的示例：

library(dplyr)
library(tidyr)
library(segmented)
library(broom)
library(purrr)

df <- data.frame(x = c(1:10), y = c(1,1,1,1,1,6:10))
lm_model <- lm(y ~ x, data = df)
seg_model <- segmented(obj = lm_model, seg.Z = ~ x)

# Combine in one dataframe
map_test <- df %>% 
  nest(data = everything()) %>%
  mutate(map_lm = purrr::map(data, ~lm(y ~ x, data = .)),
         map_seg = purrr::map2(data, map_lm, ~segmented(.y, seg.Z = ~ x)))

# Show segmented model
map_test[[3]]
#> [[1]]
#> Call: segmented.lm(obj = .y, seg.Z = ~x)
#> 
#> Meaningful coefficients of the linear terms:
#> (Intercept)            x         U1.x  
#>   1.000e+00   -1.874e-08    1.607e+00  
#> 
#> Estimated Break-Point(s):
#> psi1.x  
#>  3.889

# Plot lm and seg model
plot(map_test$data[[1]])
lines(fitted(map_test$map_lm[[1]]), col = "red")
lines(fitted(map_test$map_seg[[1]]), col = "blue")
legend(x = "topleft", 
       legend = c("lm", "seg"),
       pch = 15,
       col = c("red", "blue"))

^{使用reprex v2.0.2创建于 2022-09-04}

使用从 R 的 tidyverse 'map' 的 output 中提取的 lm 使用 'segmented' 时出错

问题描述

3 个解决方案

解决方案1
0 2019-10-23 19:54:03

解决方案2
0 2022-09-03 08:51:54

解决方案3
0 2022-09-04 18:08:06

使用从 R 的 tidyverse 'map' 的 output 中提取的 lm 使用 'segmented' 时出错

问题描述

3 个解决方案

解决方案1 0 2019-10-23 19:54:03

解决方案2 0 2022-09-03 08:51:54

解决方案3 0 2022-09-04 18:08:06

解决方案1
0 2019-10-23 19:54:03

解决方案2
0 2022-09-03 08:51:54

解决方案3
0 2022-09-04 18:08:06