在 for 循環中運行多個線性回歸模型

Question

邏輯類似於基於內容的推薦器，

內容	不受歡迎的	可取的	用戶_1	用戶_10
1個	3.00	2.77	0.11	北美
...
5000	2.50	2.11	北美	0.12

我需要運行 model 作為獨立值和每個用戶作為獨立值，因此我需要運行 10 次以適應 model 並預測每個用戶的 NA 值。

這是我硬編碼的代碼，但我想知道如何使用 for 循環，我只是搜索了幾種方法，但它們對我不起作用......

數據作為“測試”

hard code

#fit model
fit_1 = lm(user_1 ~ undesirable + desirable, data = test)
...
fit_10 = lm(user_10 ~ undesirable + desirable, data = test)

#prediction
u_1_na = test[is.na(test$user_1), c('user_1', 'undesirable', 'desirable')]
result1 = predict(fit_1, newdata = u_1_na)
which(result1 == max(result1))
max(result1)
...
u_10_na = test[is.na(test$user_10), c('user_10', 'undesirable', 'desirable')]
result10 = predict(fit_10, newdata = u_10_na)
which(result10 == max(result10))
max(result10)

#make to csv file
apply each max predict value to csv.

這是我現在嘗試的（for 循環）

mod_summaries <- list() 

for(i in 1:10) {                 
  
  predictors_i <- colnames(data)[1:10]   
  mod_summaries[[i - 1]] <- summary(     
    lm(predictors_i ~ ., test[ , c("undesirable", 'desirable')]))
  
}

Answer 1

應用方法：

mod_summaries_lapply <-
  lapply(
    colnames(mtcars),
    FUN = function(x)
      summary(lm(reformulate(".", response = x), data = mtcars))
  )

用於為每一列制作線性模型的 for 循環方法。 關鍵是 reformulate reformulate() function，它從字符串創建公式。 在問題中， function 由字符串組成，導致錯誤invalid term in model formula 。 該字符串需要使用eval()進行評估。 此示例使用 mtcars 數據集。

mod_summaries <- list() 
for(i in 1:11) {                 
  predictors_i <- colnames(mtcars)[i]   
  mod_summaries[[i]] <- summary(lm(reformulate(".", response = predictors_i), data=mtcars))
  #summary(lm(reformulate(". -1", response = predictors_i), data=mtcars))  # -1 to exclude intercept
  #summary(lm(as.formula(paste(predictors_i, "~ .")), data=mtcars)) # a "paste as formula" method
}

Answer 2

您可以使用 function as.formula和paste function 來創建您的公式。 下面是一個例子

formula_lm <- as.formula(
    paste(response_var, 
          paste(expl_var, collapse = " + "), 
          sep = " ~ "))

這意味着您有多個解釋變量（在paste中用 + 分隔）。 如果只有一個，請省略第二個paste 。

使用創建的公式，您可以像這樣使用lm函數：

lm(formula_lm, data)

編輯：在您的情況下，向量expl_var將包括不需要的和需要的變量。

Answer 3

避免循環。 使您的數據整潔。 就像是：

library(tidyverse)

test %>%
  select(-content) %>%
  pivot_longer(
    starts_with("user"),
    names_to="user",
    values_to="value"
  ) %>%
  group_by(user) %>%
  group_map(
    function(.x, .y) {
      summary(lm(user ~ ., data=.x))
    }
  )

未經測試的代碼，因為您的示例不可重現。

在 for 循環中運行多個線性回歸模型

問題描述

3 個解決方案

解決方案1
1 2022-11-14 15:58:30

解決方案2
0 2022-11-14 15:41:14

解決方案3
0 2022-11-14 16:02:40

在 for 循環中運行多個線性回歸模型

問題描述

3 個解決方案

解決方案1 1 2022-11-14 15:58:30

解決方案2 0 2022-11-14 15:41:14

解決方案3 0 2022-11-14 16:02:40

解決方案1
1 2022-11-14 15:58:30

解決方案2
0 2022-11-14 15:41:14

解決方案3
0 2022-11-14 16:02:40