[英]linear regression model using R
我是 R 的新手,但我正在努力学习。 我在 excel 中有一个数据集,我使用以下方法将它导入到 R 中:
stockPrice<-read.csv("C:/Users/Desktop/prova.csv", sep=";", header=T, check.names = FALSE, stringsAsFactors=FALSE)
导入的结果是这样的。 有 100 行和列。
1 2 3 4 5 6 7 8 9 10 11 12
1 -1,8669 -1,2096 1,0358 0,0239 -1,0284 -0,0259 0,8801 0,4778 1,1449 0,4397 -0,1530 -0,3123
2 -2,1469 -0,4331 -0,0891 1,3842 -1,4148 0,1138 -0,8275 0,5115 -1,2898 1,8105 0,8521 -1,4327
3 -1,8919 -0,6469 -0,4098 2,8243 -1,3704 -1,6783 -0,6159 1,2910 -1,4260 2,4720 0,5230 -1,6965
4 -0,7912 0,4075 0,1092 3,8167 -0,9085 -1,0804 0,4104 0,9577 -0,2531 1,1191 1,5688 -0,8727
5 -0,2726 0,1827 0,7973 3,3848 1,0666 1,1254 -1,4111 1,2030 -0,9559 1,7813 1,8331 -1,0933
6 0,0539 -0,8640 2,0607 3,4989 2,1625 0,5226 -1,3890 2,6475 -0,6684 0,4587 0,7694 0,3462
7 0,6813 -1,9639 0,1362 1,9797 2,8645 -0,1524 -1,2367 4,6739 -1,7459 2,2648 1,8341 -0,4107
8 -0,4228 -0,3357 0,1201 2,1603 4,2053 -0,3679 -0,5577 3,7251 -1,6288 2,0168 1,1571 -0,8601
9 0,3020 -0,0523 1,4912 2,6993 5,2069 -0,0497 -0,3139 3,2010 -1,1773 1,8993 0,3357 -3,4239
10 -0,0832 0,2051 2,2387 2,9303 6,1984 1,9706 -0,3759 2,7283 -2,1752 2,0772 0,3298 -4,3092
我只是复制了数据集的一部分。 每一列都被称为一个资产。 现在,我想要做的是计算线性回归,例如,第一个资产 y 将是从 1 到 9 的第 1 行,x 将是从 2 到 10 的第 1 行。我必须为每个asset.i 只需要系数的值。
一种选择是使用lapply
循环遍历列,提取x
和 'y' 并在删除lm
后使用lm
创建模型,
然后将其转换为numeric
类型
out <- lapply(stockPrice, function(vec) {
vec <- as.numeric(sub(",", "", vec))
y <- vec[1:(length(vec)-1)]
x <- vec[2:length(vec)]
coef(lm(y ~ x))
}
)
out[[1]]
# (Intercept) x
#-2798.4234922 0.8392437
如果我们想要斜率,那么rbind
list
元素并提取第二列
do.call(rbind, out)[, 2]
# 1 2 3 4 5 6 7 8 9 10 11
# 0.83924375 0.21597272 0.21761992 0.95551414 0.86204662 0.10036499 0.02051160 0.84014384 0.01129873 -0.31601104 0.18362571
# 12
# 0.46161256
- 在 excel 中检查第一列输出
-数据
-输出
除了上面的,我们还可以利用nlme
的lmList
在lmList
成'long'格式后
library(dplyr)
library(tidyr)
library(nlme)
stockPrice %>%
mutate_all(readr::parse_number) %>%
pivot_longer(everything(), values_to = 'y') %>%
group_by(name = factor(name, levels = unique(name)))
mutate(x = lead(y)) %>%
ungroup %>%
na.omit %>%
lmList(y ~ x|name, data = .)
#Call:
# Model: y ~ x | name
# Data: .
#Coefficients:
# (Intercept) x
#1 -2798.423 0.83924375
#2 -4621.407 0.21597272
#3 4274.414 0.21761992
#4 -2009.506 0.95551414
#5 -5269.101 0.86204662
#6 -1814.797 0.10036499
#7 -5479.691 0.02051160
#8 1218.587 0.84014384
#9 -8747.104 0.01129873
#10 21429.645 -0.31601104
#11 7811.527 0.18362571
#12 -3786.098 0.46161256
#Degrees of freedom: 108 total; 84 residual
#Residual standard error: 8419.53
stockPrice <- structure(list(`1` = c("-1,8669", "-2,1469", "-1,8919", "-0,7912",
"-0,2726", "0,0539", "0,6813", "-0,4228", "0,3020", "-0,0832"
), `2` = c("-1,2096", "-0,4331", "-0,6469", "0,4075", "0,1827",
"-0,8640", "-1,9639", "-0,3357", "-0,0523", "0,2051"), `3` = c("1,0358",
"-0,0891", "-0,4098", "0,1092", "0,7973", "2,0607", "0,1362",
"0,1201", "1,4912", "2,2387"), `4` = c("0,0239", "1,3842", "2,8243",
"3,8167", "3,3848", "3,4989", "1,9797", "2,1603", "2,6993", "2,9303"
), `5` = c("-1,0284", "-1,4148", "-1,3704", "-0,9085", "1,0666",
"2,1625", "2,8645", "4,2053", "5,2069", "6,1984"), `6` = c("-0,0259",
"0,1138", "-1,6783", "-1,0804", "1,1254", "0,5226", "-0,1524",
"-0,3679", "-0,0497", "1,9706"), `7` = c("0,8801", "-0,8275",
"-0,6159", "0,4104", "-1,4111", "-1,3890", "-1,2367", "-0,5577",
"-0,3139", "-0,3759"), `8` = c("0,4778", "0,5115", "1,2910",
"0,9577", "1,2030", "2,6475", "4,6739", "3,7251", "3,2010", "2,7283"
), `9` = c("1,1449", "-1,2898", "-1,4260", "-0,2531", "-0,9559",
"-0,6684", "-1,7459", "-1,6288", "-1,1773", "-2,1752"), `10` = c("0,4397",
"1,8105", "2,4720", "1,1191", "1,7813", "0,4587", "2,2648", "2,0168",
"1,8993", "2,0772"), `11` = c("-0,1530", "0,8521", "0,5230",
"1,5688", "1,8331", "0,7694", "1,8341", "1,1571", "0,3357", "0,3298"
), `12` = c("-0,3123", "-1,4327", "-1,6965", "-0,8727", "-1,0933",
"0,3462", "-0,4107", "-0,8601", "-3,4239", "-4,3092")),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
作为@akrun 答案的替代方法,您可以使用apply
代替lapply
:
使用假示例m
m <- data.frame(matrix(rnorm(100), ncol = 10, nrow = 10))
> head(m[1:3,1:3])
X1 X2 X3
1 -0.81150290 0.3196615 -0.70848803
2 1.39105642 0.8232761 0.02241253
3 -0.01187938 0.9158422 -0.21934718
你可以做:
coeff = apply(m, 2, function(x) lm(x[1:9] ~ x[2:10])$coefficients[2])
并获得一个向量,其中包含从每个资产计算出的所有系数:
> coeff
X1 X2 X3 X4 X5 X6
-0.19160847 -0.52686830 0.36973049 0.29217668 -0.70102686 0.22142335
X7 X8 X9 X10
-0.13817910 -0.14292086 0.05105796 -0.22829763
顺便说一句,当您使用read.table
打开数据集时,您应该添加参数dec = ","
以便不必处理非数字值。 所以像:
stockPrice<-read.csv("C:/Users/Desktop/prova.csv", sep=";", header=T, check.names = FALSE, stringsAsFactors=FALSE, dec = ",")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.