简体   繁体   中英

Fitting a multiple linear regression in R

So I have data like this -

##      V2   V3     V4   V5   V6   V7    V8
## 2  27.0 41.3 2948.0 26.2 51.7 42.7  89.8
## 3  22.9 66.7 4644.0  3.0 45.7 41.8 121.3
## 4  26.3 58.1 3665.0  3.0 50.8 38.5 115.2
## 5  29.1 39.9 2878.0 18.3 51.5 38.8 100.3
## 6  28.1 62.6 4493.0  7.0 50.8 39.7 123.0
## 7  26.2 63.9 3855.0  3.0 50.7 31.1 124.8

I want to do a multiple linear regression -

model1 = lm(cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + cigarette.data$V7, data = cigarette.data)

But this gives me -

    ## 
## Call:
## lm(formula = cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + 
##     cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + 
##     cigarette.data$V7, data = cigarette.data)
## 
## Residuals:
## ALL 51 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (186 not defined because of singularities)
##                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   19         NA      NA       NA
## cigarette.data$V223.1         20         NA      NA       NA
## cigarette.data$V223.9         23         NA      NA       NA
## cigarette.data$V224.8        -16         NA      NA       NA
## cigarette.data$V225.0         21         NA      NA       NA
## cigarette.data$V225.1         25         NA      NA       NA
## cigarette.data$V225.9         -9         NA      NA       NA
## cigarette.data$V226.2          8         NA      NA       NA

Which seems wrong. What's going on?

The problem is that you are fitting a model with more predictor variables than samples (ie rows). Your example contains 6 samples, so 5 variables (+ intercept = 6) would predict the V8 predictand perfectly:

cigarette.data <- structure(list(V2 = c(27, 22.9, 26.3, 29.1, 28.1, 26.2), V3 = c(41.3, 
66.7, 58.1, 39.9, 62.6, 63.9), V4 = c(2948, 4644, 3665, 2878, 
4493, 3855), V5 = c(26.2, 3, 3, 18.3, 7, 3), V6 = c(51.7, 45.7, 
50.8, 51.5, 50.8, 50.7), V7 = c(42.7, 41.8, 38.5, 38.8, 39.7, 
31.1), V8 = c(89.784450178314, 121.359442280557, 115.031032135658, 
100.201279353697, 123.401631728502, 124.750887806)), .Names = c("V2", 
"V3", "V4", "V5", "V6", "V7", "V8"), row.names = c(NA, -6L), class = "data.frame")

fit <- lm(V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
summary(fit)


Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)

Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.89203         NA      NA       NA
V2           5.66196         NA      NA       NA
V3           2.16574         NA      NA       NA
V4          -0.01412         NA      NA       NA
V5           0.03093         NA      NA       NA
V6          -4.07376         NA      NA       NA
V7                NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 5 and 0 DF,  p-value: NA

Your model should either contain fewer variables or more samples (see example below):

fit <- lm(V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
summary(fit)

Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)

Residuals:
      1       2       3       4       5       6 
-1.1873  0.9570 -2.9738  1.9870 -0.7142  1.9312 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.846025  57.297709   0.311    0.808
V2           1.848628   1.240164   1.491    0.376
V3           0.802375   0.879204   0.913    0.529
V4           0.001821   0.008315   0.219    0.863
V5          -0.583697   0.601185  -0.971    0.509

Residual standard error: 4.4 on 1 degrees of freedom
Multiple R-squared:  0.981, Adjusted R-squared:  0.9052 
F-statistic: 12.94 on 4 and 1 DF,  p-value: 0.2052

There must be a null value or 0.0 as one of the records in your data frame. Try to impute those records or remove it from your data frame before fitting the model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM