I created this kind of database that has 8 variables and I have 400 row like that. My dependent variable is the sum of all the freight that there are in 20 regions. The w_o
, v_o
and u_d
are population,gdp, and km of highway of the region.
fulldata = cbind(matrix(a,400,1),orig, dest, matrix(distanz,400,1))
fulldata
dep u_o v_o w_o u_d v_d w_d distanz
[1,] 46101718 27253 4392526 821 27253 4392526 821 89
[2,] 204380 32141 126883 114 27253 4392526 821 113
[3,] 5789359 28238 1565307 375 27253 4392526 821 170
[4,] 11449059 33745 10019166 679 27253 4392526 821 138
[5,] 389580 35525 1062860 212 27253 4392526 821 289
[6,] 2642751 29003 4907529 576 27253 4392526 821 405
[7,] 231159 27532 1217872 210 27253 4392526 821 541
[8,] 2844613 31539 4448841 568 27253 4392526 821 327
[9,] 1481309 27821 3742437 448 27253 4392526 821 400
[10,] 399624 22396 888908 59 27253 4392526 821 551
[11,] 262570 24726 1538055 168 27253 4392526 821 544
[12,] 499115 29624 5898124 485 27253 4392526 821 669
[13,] 249596 22945 1322247 352 27253 4392526 821 720
[14,] 42501 18447 310449 36 27253 4392526 821 857
[15,] 273450 16219 5839084 442 27253 4392526 821 869
[16,] 306917 16512 4063888 313 27253 4392526 821 998
[17,] 167326 19663 570365 29 27253 4392526 821 995
[18,] 26384 15514 1965128 295 27253 4392526 821 1275
[19,] 20189 16289 5056641 662 27253 4392526 821 1584
[20,] 0 18539 1653135 23 27253 4392526 821 933
Now I have to do a regression with this 20 row, where my y should be the "dep" column. I tried with this code :
lm <- lm(fulldata[1:19]~fulldata[1:19,2]+fulldata[1:19,3]+fulldata[1:19,4]+fulldata[1:19,5]+fulldata[1:19,6]+fulldata[1:19,7]+fulldata[1:19,8])
and the result was :
summary(lm)
Call:
lm(formula = fulldata[1:19] ~ fulldata[1:19, 2] + fulldata[1:19,
3] + fulldata[1:19, 4] + fulldata[1:19, 5] + fulldata[1:19,
6] + fulldata[1:19, 7] + fulldata[1:19, 8])
Residuals:
Min 1Q Median 3Q Max
-7970288 -6278944 31922 3227442 15159011
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.805e+07 1.668e+07 2.282 0.03866 *
fulldata[1:19, 2] -1.185e+03 5.006e+02 -2.368 0.03283 *
fulldata[1:19, 3] -1.727e+00 1.076e+00 -1.605 0.13089
fulldata[1:19, 4] 4.252e+04 1.195e+04 3.558 0.00315 **
fulldata[1:19, 5] NA NA NA NA
fulldata[1:19, 6] NA NA NA NA
fulldata[1:19, 7] NA NA NA NA
fulldata[1:19, 8] -2.390e+04 7.779e+03 -3.072 0.00828 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6894000 on 14 degrees of freedom
Multiple R-squared: 0.6714, Adjusted R-squared: 0.5775
F-statistic: 7.151 on 4 and 14 DF, p-value: 0.002359
It is right the regression code? Having 3 column with the same number the result of the coefficient is NA and I don't know how to avoid it. I hope i was clear Thanks to all
You have NA
's in these columns because they are constants. You already have a constant in the form of an intercept of your regression model, thus these columns of information play no role. They don't vary, so they can't explain variation in your dependent variable. They're not informative.
You should just drop them from the regression equation.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.