So I have data like this -
## V2 V3 V4 V5 V6 V7 V8
## 2 27.0 41.3 2948.0 26.2 51.7 42.7 89.8
## 3 22.9 66.7 4644.0 3.0 45.7 41.8 121.3
## 4 26.3 58.1 3665.0 3.0 50.8 38.5 115.2
## 5 29.1 39.9 2878.0 18.3 51.5 38.8 100.3
## 6 28.1 62.6 4493.0 7.0 50.8 39.7 123.0
## 7 26.2 63.9 3855.0 3.0 50.7 31.1 124.8
I want to do a multiple linear regression -
model1 = lm(cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + cigarette.data$V7, data = cigarette.data)
But this gives me -
##
## Call:
## lm(formula = cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 +
## cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 +
## cigarette.data$V7, data = cigarette.data)
##
## Residuals:
## ALL 51 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (186 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19 NA NA NA
## cigarette.data$V223.1 20 NA NA NA
## cigarette.data$V223.9 23 NA NA NA
## cigarette.data$V224.8 -16 NA NA NA
## cigarette.data$V225.0 21 NA NA NA
## cigarette.data$V225.1 25 NA NA NA
## cigarette.data$V225.9 -9 NA NA NA
## cigarette.data$V226.2 8 NA NA NA
Which seems wrong. What's going on?
The problem is that you are fitting a model with more predictor variables than samples (ie rows). Your example contains 6 samples, so 5 variables (+ intercept = 6) would predict the V8
predictand perfectly:
cigarette.data <- structure(list(V2 = c(27, 22.9, 26.3, 29.1, 28.1, 26.2), V3 = c(41.3,
66.7, 58.1, 39.9, 62.6, 63.9), V4 = c(2948, 4644, 3665, 2878,
4493, 3855), V5 = c(26.2, 3, 3, 18.3, 7, 3), V6 = c(51.7, 45.7,
50.8, 51.5, 50.8, 50.7), V7 = c(42.7, 41.8, 38.5, 38.8, 39.7,
31.1), V8 = c(89.784450178314, 121.359442280557, 115.031032135658,
100.201279353697, 123.401631728502, 124.750887806)), .Names = c("V2",
"V3", "V4", "V5", "V6", "V7", "V8"), row.names = c(NA, -6L), class = "data.frame")
fit <- lm(V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
summary(fit)
Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.89203 NA NA NA
V2 5.66196 NA NA NA
V3 2.16574 NA NA NA
V4 -0.01412 NA NA NA
V5 0.03093 NA NA NA
V6 -4.07376 NA NA NA
V7 NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 5 and 0 DF, p-value: NA
Your model should either contain fewer variables or more samples (see example below):
fit <- lm(V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
summary(fit)
Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
Residuals:
1 2 3 4 5 6
-1.1873 0.9570 -2.9738 1.9870 -0.7142 1.9312
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.846025 57.297709 0.311 0.808
V2 1.848628 1.240164 1.491 0.376
V3 0.802375 0.879204 0.913 0.529
V4 0.001821 0.008315 0.219 0.863
V5 -0.583697 0.601185 -0.971 0.509
Residual standard error: 4.4 on 1 degrees of freedom
Multiple R-squared: 0.981, Adjusted R-squared: 0.9052
F-statistic: 12.94 on 4 and 1 DF, p-value: 0.2052
There must be a null value or 0.0 as one of the records in your data frame. Try to impute those records or remove it from your data frame before fitting the model.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.