简体   繁体   中英

lm function in R is excluding 1 dummy variable

I have a dataframe that looks like this:

      Date    A      B      MONTH
2016-01-01    3     10    January
2016-01-02    5     13    January
2016-01-03    8     12    January
.
.
.
2016-12-29    4     13   December
2016-12-30    5     12   December
2016-12-31    6      4   December

With this dataframe, I want to run a regression model with the Month column as dummy variables.

I have tried two methods to run this and each time I do it, it always excludes the month "April".

Any idea why this may be happening?

1st method:

lm(A ~ MONTH + B, data = df)

Example output:

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        7.248e+01  3.600e+01   0.902  0.36754    
MONTHAugust        7.425e+02  3.630e+01   6.680 9.29e-11 ***
MONTHDecember     -1.840e+02  3.277e+01  -5.613 4.02e-08 ***
MONTHFebruary     -8.673e+00  2.855e+01  -0.129  0.89770    
MONTHJanuary      -4.084e+01  2.945e+01  -0.368  0.71291    
MONTHJuly          9.407e+02  3.100e+01   4.540 7.73e-06 ***
MONTHJune          3.387e+01  3.077e+01   2.401  0.01687 *  
MONTHMarch         2.797e+02  2.884e+01   6.231 1.32e-09 ***
MONTHMay          -9.500e+01  3.122e+01  -3.043  0.00252 ** 
MONTHNovember     -1.321e+01  3.555e+01  -1.778  0.07626 .  
MONTHOctober       7.145e+01  3.200e+01   0.983  0.32637    
MONTHSeptember     9.691e+02  3.916e+01   4.319 2.04e-05 ***
B                  5.279e-02  1.161e-03  11.013  < 2e-16 ***

2nd Method:

A <- model.matrix(A ~ B + MONTH, df)

head(A)

  (Intercept) Sum.of.Media.Cost MONTHAugust MONTHDecember MONTHFebruary MONTHJanuary MONTHJuly MONTHJune MONTHMarch MONTHMay
1           1                 0           0             0             0            
1         0         0          0        0
2           1                 0           0             0             0            
1         0         0          0        0
3           1                 0           0             0             0            
1         0         0          0        0
4           1                 0           0             0             0            
1         0         0          0        0
5           1                 0           0             0             0            
1         0         0          0        0
6           1                 0           0             0             0            
1         0         0          0        0
  MONTHNovember MONTHOctober MONTHSeptember
1             0            0              0
2             0            0              0
3             0            0              0
4             0            0              0
5             0            0              0
6             0            0              0

When you deal with dummy variables it's normal. If you have n levels for your factor variable, then you need only n-1 dummy variables. Since the remaining case is when all the dummy variables are zero. I think that April is the month excluded beacause is the first one if you consider alphabetical ordering.

Try A ~ B + MONTH -1 -- if your dummies are complete, their linear combination is the same as the constant. Hence reduced rank, and you cannot do that so something has to give.

Either you keep the constant (and remove one monthly dummy) to produce "per month offset to intercept", or, and that is what I would do, remove the constant to get "monthly intercept".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM