簡體   English   中英

如何使用data.table在所有列組合上找到線性回歸

[英]How to find linear regression on all combinations of columns using data.table

我正在嘗試為每個可能的變量組合找到iris數據集組之間的線性回歸。 由於這是一個玩具示例,因此很容易對每個變量集分別進行線性回歸並連接結果。 但是,對於包含大量列的data.table ,很難找到所有組之間的線性回歸。

library(data.table)
  dt = copy(iris)
  setDT(dt)[, .(model1 = lm(Sepal.Length ~ Petal.Width, .SD)$coeff[2], model2 = lm(Petal.Width ~ Sepal.Length, .SD)$coeff[2]), by = Species]
      Species    model1     model2
1:     setosa 0.9301727 0.08314444
2: versicolor 1.4263647 0.20935719
3:  virginica 0.6508306 0.12141646

  setDT(dt)[, .(model1 = lm(Sepal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.3878739 0.0814112
2: versicolor 0.3743068 0.8393782
3:  virginica 0.2343482 0.6863153

  setDT(dt)[, .(model1 = lm(Sepal.Width ~ Sepal.Length, .SD)$coeff[2], model2 = lm(Sepal.Length ~ Sepal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.7985283 0.6904897
2: versicolor 0.3197193 0.8650777
3:  virginica 0.2318905 0.9015345

  setDT(dt)[, .(model1 = lm(Petal.Width ~ Petal.Length, .SD)$coeff[2], model2 = lm(Petal.Length ~ Petal.Width, .SD)$coeff[2]), by = Species]
      Species    model1    model2
1:     setosa 0.2012451 0.5464903
2: versicolor 0.3310536 1.8693247
3:  virginica 0.1602970 0.6472593

不是單獨對每組變量進行線性回歸,是否可以使用 data.table 輕松完成? 我想要的輸出如下 -

      Species   Variable1   Variable2     model1     model2
       setosa Sepal.Length  Petal.Width   0.9301727 0.08314444
   versicolor Sepal.Length  Petal.Width   1.4263647 0.20935719
    virginica Sepal.Length  Petal.Width   0.6508306 0.12141646
       setosa Sepal.Width   Petal.Length  0.3878739 0.0814112
   versicolor Sepal.Width   Petal.Length  0.3743068 0.8393782
    virginica Sepal.Width   Petal.Length  0.2343482 0.6863153
       setosa Sepal.Width   Sepal.Length  0.7985283 0.6904897
   versicolor Sepal.Width   Sepal.Length  0.3197193 0.8650777
    virginica Sepal.Width   Sepal.Length  0.2318905 0.9015345
       setosa Petal.Width   Petal.Length  0.2012451 0.5464903
   versicolor Petal.Width   Petal.Length  0.3310536 1.8693247
    virginica Petal.Width   Petal.Length  0.1602970 0.6472593

我們可以使用combn來創建一個公式list ,並在除“物種”之外的“鳶尾花”列名稱上reformulate公式,然后循環遍歷數據list按“物種”分組的list ,應用lm並提取coeff

library(data.table)
lst1 <- combn(names(iris)[-5], 2, FUN = 
      function(x) reformulate(x[1], x[2]), simplify = FALSE)
dt = copy(iris)
out <- setDT(dt)[, lapply(lst1, function(fmla) 
        lm(fmla, .SD)$coeff), 
       by = Species]
setnames(out, -1, sapply(lst1, deparse))

-輸出

out
      Species Sepal.Width ~ Sepal.Length Petal.Length ~ Sepal.Length Petal.Width ~ Sepal.Length Petal.Length ~ Sepal.Width Petal.Width ~ Sepal.Width
1:     setosa                 -0.5694327                   0.8030518                -0.17022108                  1.1829224                0.02417907
2:     setosa                  0.7985283                   0.1316317                 0.08314444                  0.0814112                0.06470856
3: versicolor                  0.8721460                   0.1851155                 0.08325571                  1.9349223                0.16690570
4: versicolor                  0.3197193                   0.6864698                 0.20935719                  0.8393782                0.41844560
5:  virginica                  1.4463054                   0.6104680                 1.22610837                  3.5108983                0.66405950
6:  virginica                  0.2318905                   0.7500808                 0.12141646                  0.6863153                0.45794906
   Petal.Width ~ Petal.Length
1:                -0.04822033
2:                 0.20124509
3:                -0.08428835
4:                 0.33105360
5:                 1.13603130

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM