简体   繁体   English

R中的循环 - 线性回归

[英]Loops in R - linear regression

I've just started out learning R and can't seem to get this loop to work.我刚刚开始学习 R,似乎无法让这个循环工作。 I have a data frame containing 250 rows and 503 columns (y) and another data frame containing 250 rows and 1 column (x).我有一个包含 250 行和 503 列 (y) 的数据框和另一个包含 250 行和 1 列 (x) 的数据框。

I am trying to get a loop to run 503 separate regressions without having to type it in individually ie.我试图让一个循环运行 503 个单独的回归,而不必单独输入它。

(output_1 <- lm(y$1st column ~ x))
(output_2 <- lm(y$2nd column ~ x))

across all 250 rows in each regression.在每个回归的所有 250 行中。

I tried this loop:我试过这个循环:

for (i in 1:503) {
output_loop <- lm(y[,i]~x)
}
output_total <- cbind(output$coefficients)

but this only gave me one intercept and one coefficient, as opposed to 503 intercepts and 503 coefficients.但这只给了我一个截距和一个系数,而不是 503 个截距和 503 个系数。

The rows of each data frame have time markers that are aligned in the format yyyy-mm-dd but I don't believe this affects the regression as the intercept and coefficients output sought are independent of time.每个数据帧的行都具有以 yyyy-mm-dd 格式对齐的时间标记,但我认为这不会影响回归,因为所寻求的截距和系数输出与时间无关。

I have also tried using a basic lm:我也尝试过使用基本的lm:

(output <- lm(y~x))
 output_total <- cbind(output$coefficients)

and this gives 503 intercepts and 503 coefficients, however the output is wrong when I spot checked the output against some columns (running individual regression as above).这给出了 503 个截距和 503 个系数,但是当我根据某些列检查输出时输出是错误的(如上所述运行单个回归)。

Any help on this loop is much appreciated!非常感谢对此循环的任何帮助!

Thank you谢谢

I'm not sure you're approaching this the best way, but here's something that I think will achieve what you described.我不确定您是否以最佳方式接近此方法,但我认为这将实现您所描述的内容。

# create some toy data to match your description

set.seed(340)
y <- data.frame(replicate(503, runif(250, 0, 1)))
x <- data.frame(v1=runif(250, 0, 1))


out <- data.frame(NULL)              # create object to keep results
for (i in 1:length(y)) {
  m <- summary(lm(y[,i] ~ x[,1]))    # run model
  out[i, 1] <- names(y)[i]           # print variable name
  out[i, 2] <- m$coefficients[1,1]   # intercept
  out[i, 3] <- m$coefficients[2,1]   # coefficient
}
names(out) <- c("y.variable", "intercept", "coef.x")
head(out)

# y.variable intercept       coef.x
# 1         X1 0.4841710 -0.015186852
# 2         X2 0.4972775 -0.002306964
# 3         X3 0.4410326  0.096450185
# 4         X4 0.4547249  0.041582039
# 5         X5 0.5039661  0.062429142
# 6         X6 0.5331573 -0.092806309

Your loop is close you just need to create a place to catch the results.你的循环很接近,你只需要创建一个地方来捕捉结果。

output_loop=list(NA)
for (i in 1:503) {
output_loop[[i]] <- lm(y[,i]~x)

}

If you just want the coefficients in a data.frame then restructure things to catch just the two coefficients from each model如果您只想要 data.frame 中的系数,则重组事物以仅捕获每个模型中的两个系数

output_loop=data.frame(int=NA,slope=NA)
for (i in 1:503) {
  output_loop[i,] <- coefficients(lm(y[,i]~x))

}

Here is a "tidy" solution based on the tidyverse and broom packages:这是一个基于 tidyverse 和 broom 包的“整洁”解决方案:

library(tidyverse)
library(broom)


set.seed(340)
y <- data.frame(replicate(503, runif(250, 0, 1)))
x <- data.frame(v1=runif(250, 0, 1))

y %>%
    pivot_longer(everything(), names_to = "variable", values_to = "y") %>%
    split(.$variable) %>%
    map(., ~ mutate(., x = x$v1)) %>% # Note that v1 is the name of MY data frame x
    map(., ~ lm(y ~ x, data = .)) %>%
    map(tidy) %>% # This bit here extracts the model parameters. If you want any other information, you could use summary()
    imap_dfr(., ~ mutate(.x, variable = .y))

#> # A tibble: 1,006 x 6
#>    term         estimate std.error statistic  p.value variable
#>    <chr>           <dbl>     <dbl>     <dbl>    <dbl> <chr>   
#>  1 (Intercept)  0.484       0.0361  13.4     3.72e-31 X1      
#>  2 x           -0.0152      0.0631  -0.241   8.10e- 1 X1      
#>  3 (Intercept)  0.499       0.0356  14.0     2.84e-33 X10     
#>  4 x           -0.00601     0.0622  -0.0967  9.23e- 1 X10     
#>  5 (Intercept)  0.492       0.0344  14.3     3.03e-34 X100    
#>  6 x            0.000321    0.0600   0.00534 9.96e- 1 X100    
#>  7 (Intercept)  0.530       0.0361  14.7     1.65e-35 X101    
#>  8 x           -0.0972      0.0631  -1.54    1.25e- 1 X101    
#>  9 (Intercept)  0.534       0.0357  15.0     1.77e-36 X102    
#> 10 x           -0.0144      0.0623  -0.231   8.18e- 1 X102    
#> # ... with 996 more rows

Created on 2019-12-04 by the reprex package (v0.3.0)reprex 包(v0.3.0)于 2019 年 12 月 4 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM