简体   繁体   中英

How to do a regression with variables with the same number?

I created this kind of database that has 8 variables and I have 400 row like that. My dependent variable is the sum of all the freight that there are in 20 regions. The w_o , v_o and u_d are population,gdp, and km of highway of the region.

    fulldata = cbind(matrix(a,400,1),orig, dest, matrix(distanz,400,1))
    fulldata
               dep   u_o      v_o w_o   u_d      v_d w_d distanz
    [1,]  46101718 27253  4392526 821 27253  4392526 821      89
    [2,]    204380 32141   126883 114 27253  4392526 821     113
    [3,]   5789359 28238  1565307 375 27253  4392526 821     170
    [4,]  11449059 33745 10019166 679 27253  4392526 821     138
    [5,]    389580 35525  1062860 212 27253  4392526 821     289
    [6,]   2642751 29003  4907529 576 27253  4392526 821     405
    [7,]    231159 27532  1217872 210 27253  4392526 821     541
    [8,]   2844613 31539  4448841 568 27253  4392526 821     327
    [9,]   1481309 27821  3742437 448 27253  4392526 821     400
    [10,]    399624 22396   888908  59 27253  4392526 821     551
    [11,]    262570 24726  1538055 168 27253  4392526 821     544
    [12,]    499115 29624  5898124 485 27253  4392526 821     669
    [13,]    249596 22945  1322247 352 27253  4392526 821     720
    [14,]     42501 18447   310449  36 27253  4392526 821     857
    [15,]    273450 16219  5839084 442 27253  4392526 821     869
    [16,]    306917 16512  4063888 313 27253  4392526 821     998
    [17,]    167326 19663   570365  29 27253  4392526 821     995
    [18,]     26384 15514  1965128 295 27253  4392526 821    1275
    [19,]     20189 16289  5056641 662 27253  4392526 821    1584
    [20,]         0 18539  1653135  23 27253  4392526 821     933

Now I have to do a regression with this 20 row, where my y should be the "dep" column. I tried with this code :

    lm <- lm(fulldata[1:19]~fulldata[1:19,2]+fulldata[1:19,3]+fulldata[1:19,4]+fulldata[1:19,5]+fulldata[1:19,6]+fulldata[1:19,7]+fulldata[1:19,8])

and the result was :

    summary(lm)
    Call:
    lm(formula = fulldata[1:19] ~ fulldata[1:19, 2] + fulldata[1:19, 
    3] + fulldata[1:19, 4] + fulldata[1:19, 5] + fulldata[1:19, 
    6] + fulldata[1:19, 7] + fulldata[1:19, 8])

    Residuals:
    Min       1Q   Median       3Q      Max 
    -7970288 -6278944    31922  3227442 15159011 

    Coefficients: (3 not defined because of singularities)
                         Estimate Std. Error t value Pr(>|t|)   
    (Intercept)        3.805e+07  1.668e+07   2.282  0.03866 * 
    fulldata[1:19, 2] -1.185e+03  5.006e+02  -2.368  0.03283 * 
    fulldata[1:19, 3] -1.727e+00  1.076e+00  -1.605  0.13089   
    fulldata[1:19, 4]  4.252e+04  1.195e+04   3.558  0.00315 **
    fulldata[1:19, 5]         NA         NA      NA       NA   
    fulldata[1:19, 6]         NA         NA      NA       NA   
    fulldata[1:19, 7]         NA         NA      NA       NA   
    fulldata[1:19, 8] -2.390e+04  7.779e+03  -3.072  0.00828 **
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

    Residual standard error: 6894000 on 14 degrees of freedom
    Multiple R-squared:  0.6714,    Adjusted R-squared:  0.5775 
    F-statistic: 7.151 on 4 and 14 DF,  p-value: 0.002359

It is right the regression code? Having 3 column with the same number the result of the coefficient is NA and I don't know how to avoid it. I hope i was clear Thanks to all

You have NA 's in these columns because they are constants. You already have a constant in the form of an intercept of your regression model, thus these columns of information play no role. They don't vary, so they can't explain variation in your dependent variable. They're not informative.

You should just drop them from the regression equation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM