[英]How to find linear regression on all combinations of columns using data.table
我正在嘗試為每個可能的變量組合找到iris
數據集組之間的線性回歸。 由於這是一個玩具示例,因此很容易對每個變量集分別進行線性回歸並連接結果。 但是,對於包含大量列的data.table
,很難找到所有組之間的線性回歸。
library(data.table)
dt = copy(iris)
setDT(dt)[, .(model1 = lm(Sepal.Length ~ Petal.Width, .SD)$coeff[2], model2 = lm(Petal.Width ~ Sepal.Length, .SD)$coeff[2]), by = Species]
Species model1 model2
1: setosa 0.9301727 0.08314444
2: versicolor 1.4263647 0.20935719
3: virginica 0.6508306 0.12141646
setDT(dt)[, .(model1 = lm(Sepal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
Species model1 model2
1: setosa 0.3878739 0.0814112
2: versicolor 0.3743068 0.8393782
3: virginica 0.2343482 0.6863153
setDT(dt)[, .(model1 = lm(Sepal.Width ~ Sepal.Length, .SD)$coeff[2], model2 = lm(Sepal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
Species model1 model2
1: setosa 0.7985283 0.6904897
2: versicolor 0.3197193 0.8650777
3: virginica 0.2318905 0.9015345
setDT(dt)[, .(model1 = lm(Petal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Petal.Width, .SD)$coeff[2]), by = Species]
Species model1 model2
1: setosa 0.2012451 0.5464903
2: versicolor 0.3310536 1.8693247
3: virginica 0.1602970 0.6472593
不是單獨對每組變量進行線性回歸,是否可以使用 data.table 輕松完成? 我想要的輸出如下 -
Species Variable1 Variable2 model1 model2
setosa Sepal.Length Petal.Width 0.9301727 0.08314444
versicolor Sepal.Length Petal.Width 1.4263647 0.20935719
virginica Sepal.Length Petal.Width 0.6508306 0.12141646
setosa Sepal.Width Petal.Length 0.3878739 0.0814112
versicolor Sepal.Width Petal.Length 0.3743068 0.8393782
virginica Sepal.Width Petal.Length 0.2343482 0.6863153
setosa Sepal.Width Sepal.Length 0.7985283 0.6904897
versicolor Sepal.Width Sepal.Length 0.3197193 0.8650777
virginica Sepal.Width Sepal.Length 0.2318905 0.9015345
setosa Petal.Width Petal.Length 0.2012451 0.5464903
versicolor Petal.Width Petal.Length 0.3310536 1.8693247
virginica Petal.Width Petal.Length 0.1602970 0.6472593
我們可以使用combn
來創建一個公式list
,並在除“物種”之外的“鳶尾花”列名稱上reformulate
公式,然后循環遍歷數據list
按“物種”分組的list
,應用lm
並提取coeff
library(data.table)
lst1 <- combn(names(iris)[-5], 2, FUN =
function(x) reformulate(x[1], x[2]), simplify = FALSE)
dt = copy(iris)
out <- setDT(dt)[, lapply(lst1, function(fmla)
lm(fmla, .SD)$coeff),
by = Species]
setnames(out, -1, sapply(lst1, deparse))
-輸出
out
Species Sepal.Width ~ Sepal.Length Petal.Length ~ Sepal.Length Petal.Width ~ Sepal.Length Petal.Length ~ Sepal.Width Petal.Width ~ Sepal.Width
1: setosa -0.5694327 0.8030518 -0.17022108 1.1829224 0.02417907
2: setosa 0.7985283 0.1316317 0.08314444 0.0814112 0.06470856
3: versicolor 0.8721460 0.1851155 0.08325571 1.9349223 0.16690570
4: versicolor 0.3197193 0.6864698 0.20935719 0.8393782 0.41844560
5: virginica 1.4463054 0.6104680 1.22610837 3.5108983 0.66405950
6: virginica 0.2318905 0.7500808 0.12141646 0.6863153 0.45794906
Petal.Width ~ Petal.Length
1: -0.04822033
2: 0.20124509
3: -0.08428835
4: 0.33105360
5: 1.13603130
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.