[英]How to do a regression with variables with the same number?
I created this kind of database that has 8 variables and I have 400 row like that. 我创建了一个具有8个变量的数据库,并且有400行这样的数据库。 My dependent variable is the sum of all the freight that there are in 20 regions.
我的因变量是20个地区中所有货运的总和。 The
w_o
, v_o
and u_d
are population,gdp, and km of highway of the region. w_o
, v_o
和u_d
分别是该地区的人口,gdp和公里的公路。
fulldata = cbind(matrix(a,400,1),orig, dest, matrix(distanz,400,1))
fulldata
dep u_o v_o w_o u_d v_d w_d distanz
[1,] 46101718 27253 4392526 821 27253 4392526 821 89
[2,] 204380 32141 126883 114 27253 4392526 821 113
[3,] 5789359 28238 1565307 375 27253 4392526 821 170
[4,] 11449059 33745 10019166 679 27253 4392526 821 138
[5,] 389580 35525 1062860 212 27253 4392526 821 289
[6,] 2642751 29003 4907529 576 27253 4392526 821 405
[7,] 231159 27532 1217872 210 27253 4392526 821 541
[8,] 2844613 31539 4448841 568 27253 4392526 821 327
[9,] 1481309 27821 3742437 448 27253 4392526 821 400
[10,] 399624 22396 888908 59 27253 4392526 821 551
[11,] 262570 24726 1538055 168 27253 4392526 821 544
[12,] 499115 29624 5898124 485 27253 4392526 821 669
[13,] 249596 22945 1322247 352 27253 4392526 821 720
[14,] 42501 18447 310449 36 27253 4392526 821 857
[15,] 273450 16219 5839084 442 27253 4392526 821 869
[16,] 306917 16512 4063888 313 27253 4392526 821 998
[17,] 167326 19663 570365 29 27253 4392526 821 995
[18,] 26384 15514 1965128 295 27253 4392526 821 1275
[19,] 20189 16289 5056641 662 27253 4392526 821 1584
[20,] 0 18539 1653135 23 27253 4392526 821 933
Now I have to do a regression with this 20 row, where my y should be the "dep" column. 现在,我必须对这20行进行回归分析,其中y应该是“ dep”列。 I tried with this code :
我尝试使用此代码:
lm <- lm(fulldata[1:19]~fulldata[1:19,2]+fulldata[1:19,3]+fulldata[1:19,4]+fulldata[1:19,5]+fulldata[1:19,6]+fulldata[1:19,7]+fulldata[1:19,8])
and the result was : 结果是:
summary(lm)
Call:
lm(formula = fulldata[1:19] ~ fulldata[1:19, 2] + fulldata[1:19,
3] + fulldata[1:19, 4] + fulldata[1:19, 5] + fulldata[1:19,
6] + fulldata[1:19, 7] + fulldata[1:19, 8])
Residuals:
Min 1Q Median 3Q Max
-7970288 -6278944 31922 3227442 15159011
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.805e+07 1.668e+07 2.282 0.03866 *
fulldata[1:19, 2] -1.185e+03 5.006e+02 -2.368 0.03283 *
fulldata[1:19, 3] -1.727e+00 1.076e+00 -1.605 0.13089
fulldata[1:19, 4] 4.252e+04 1.195e+04 3.558 0.00315 **
fulldata[1:19, 5] NA NA NA NA
fulldata[1:19, 6] NA NA NA NA
fulldata[1:19, 7] NA NA NA NA
fulldata[1:19, 8] -2.390e+04 7.779e+03 -3.072 0.00828 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6894000 on 14 degrees of freedom
Multiple R-squared: 0.6714, Adjusted R-squared: 0.5775
F-statistic: 7.151 on 4 and 14 DF, p-value: 0.002359
It is right the regression code? 回归代码对吗? Having 3 column with the same number the result of the coefficient is NA and I don't know how to avoid it.
具有3个具有相同编号的列,系数的结果为NA,但我不知道如何避免。 I hope i was clear Thanks to all
我希望我很清楚感谢所有人
You have NA
's in these columns because they are constants. 这些列中包含
NA
,因为它们是常量。 You already have a constant in the form of an intercept of your regression model, thus these columns of information play no role. 您已经具有回归模型的截距形式的常量,因此这些信息列不起作用。 They don't vary, so they can't explain variation in your dependent variable.
它们没有变化,因此无法解释因变量的变化。 They're not informative.
他们没有提供信息。
You should just drop them from the regression equation. 您只需将它们从回归方程式中删除即可。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.