简体   繁体   中英

How to exclude a level from a factor variable in robust regression?

As one can see from the two regressions below that, lm() gives solution while rlm() crashes, because of the singularity of the data matrix. lm() internally drops one factor level to avoid singularity, but rlm() does not.

Simple linear regression case:

  result.lm <- lm(log(export + import) ~ log(gdp.i*gdp.j) + 
        log(dis) + log(Sij) + AFC + GFC + I(dpgdp*0.001)+ 
        factor(id),
        data = mydata)


Coefficients: (1 not defined because of singularities)
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -130.56396   34.10023  -3.829 0.000177 ***
log(gdp.i * gdp.j)    1.73176    0.20873   8.297 2.39e-14 ***
log(dis)              6.89208    4.75270   1.450 0.148750    
log(Sij)             -5.18435    1.80221  -2.877 0.004502 ** 
AFC                  -1.00819    0.86188  -1.170 0.243640    
GFC                   0.49326    0.58950   0.837 0.403834    
I(dpgdp * 0.001)     -0.05713    0.03733  -1.530 0.127701    
factor(id)IDN_PHL    -7.02467    4.46062  -1.575 0.117044    
factor(id)IDN_SGP     4.10315    1.42839   2.873 0.004558 ** 
factor(id)IDN_THA    -3.37530    3.44619  -0.979 0.328675    
factor(id)IDN_VNM   -11.75983    5.24573  -2.242 0.026189 *  
factor(id)MYS_SGP    12.16543    6.13940   1.982 0.049045 *  
factor(id)MYS_THA     2.75659    0.72603   3.797 0.000200 ***
factor(id)MYS_VNM    -5.31554    3.01239  -1.765 0.079325 .  
factor(id)PHL_MYS    -3.74970    3.82106  -0.981 0.327742    
factor(id)PHL_SGP    -3.72441    3.84997  -0.967 0.334642    
factor(id)PHL_THA    -2.32179    3.26691  -0.711 0.478187    
factor(id)PHL_VNM    -2.43611    2.32941  -1.046 0.297045    
factor(id)SGP_THA     4.18854    1.68147   2.491 0.013639 *  
factor(id)SGP_VNM    -0.54607    3.62445  -0.151 0.880409    
factor(id)THA_VNM          NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.982 on 181 degrees of freedom
Multiple R-squared:  0.5808,    Adjusted R-squared:  0.5368 
F-statistic:  13.2 on 19 and 181 DF,  p-value: < 2.2e-16

The robust regression case:

To avoid singularity in rlm() from MASS package, I used:

library(car); library(MASS)
  result.rlm <- rlm(log(export + import) ~ log(gdp.i*gdp.j) + 
        log(dis) + log(Sij) + AFC + GFC + I(dpgdp*0.001)+ 
        factor(id, exclude == "THA_VNM"),
        data = mydata)

which returns the error message:

Error in rlm.default(x, y, weights, method = method, wt.method = wt.method,  : 
  'x' is singular: singular fits are not implemented in 'rlm'
In addition: Warning message:
In as.vector(exclude, typeof(x)) : NAs introduced by coercion

How can I exclude a level from the factor id to get results from the rlm() function?

A portion of mydata that can be used to replicate the problem is given below:

 id export  import  gdp.i   gdp.j   dis Sij AFC GFC dpgdp
 PHL_MYS    21090   54082   1.03E+11    1.44E+11    2470.863    0.243267763 0   0   4352.999196
 IDN_MYS    1273344 6350191 2.86E+11    1.44E+11    1174.196    0.222531092 0   0   4280.470783
 IDN_PHL    1352286 6501568 2.86E+11    1.03E+11    2792.088    0.194772855 0   0   72.528413
 MYS_SGP    11849639    3100352 1.44E+11    1.24E+11    315.5433    0.248594031 0   0   23398.89647
 PHL_SGP    1010140 3406594 1.03E+11    1.24E+11    2396.775    0.247965171 0   0   27751.89567
 IDN_SGP    62247342    2374634 2.86E+11    1.24E+11    886.1407    0.210675425 0   0   27679.36726
 PHL_THA    126863901   131288917   1.03E+11    1.76E+11    2210.015    0.232802184 0   0   1489.015435
 IDN_THA    174813908   406988998   2.86E+11    1.76E+11    2316.466    0.23596528  0   0   1416.487022
 SGP_THA    131102650   238482275   1.24E+11    1.76E+11    1433.936    0.242235497 0   0   26262.88023
 MYS_THA    45626339    92914926    1.44E+11    1.76E+11    1187.123    0.247368503 0   0   2863.983761
 IDN_VNM    14635829    1705791 2.86E+11    57633255739 3023.314    0.139630737 0   0   573.9790196
 MYS_VNM    1140384 19607   1.44E+11    57633255739 2040.94 0.204415889 0   0   4854.449802
 SGP_VNM    5413912 5137507 1.24E+11    57633255739 2207.195    0.216937571 0   0   28253.34628
 THA_VNM    10375   316 1.76E+11    57633255739 990.7018    0.185642137 0   0   1990.466041
 IDN_MYS    61500692    3164431 3.65E+11    1.63E+11    1174.196    0.213350529 0   0   4578.607187
 IDN_SGP    12985625    23866106    3.65E+11    1.39E+11    886.1407    0.199850336 0   0   29984.59857
 PHL_SGP    18400   116669  1.22E+11    1.39E+11    2396.775    0.248964804 0   0   30186.80165
 MYS_SGP    14410298    2247747 1.63E+11    1.39E+11    315.5433    0.248461189 0   0   25405.99138
 MYS_THA    68755379    26223833    1.63E+11    2.07E+11    1187.123    0.246396222 0   0   3036.402002
 IDN_THA    49410654    138502983   3.65E+11    2.07E+11    2316.466    0.231027427 0   0   1542.205185
 PHL_THA    72197200    166850064   1.22E+11    2.07E+11    2210.015    0.233390872 0   0   1744.408267
 SGP_THA    132220146   277333084   1.39E+11    2.07E+11    1433.936    0.240330641 0   0   28442.39339
 PHL_VNM    40525   3475176 1.22E+11    66371664817 1750.016    0.228081191 0   0   602.1758296
 SGP_VNM    9686544 6916182 1.39E+11    66371664817 2207.195    0.218722399 0   0   30788.97748
 MYS_VNM    118597  107725  1.63E+11    66371664817 2040.94 0.205795797 0   0   5382.986099
 THA_VNM    2925753 11249569    2.07E+11    66371664817 990.7018    0.183801906 0   0   2346.584097
 IDN_MYS    88079132    24559821    4.32E+11    1.94E+11    1174.196    0.213634936 0   0   5347.115267
 IDN_PHL    25877152    6138473 4.32E+11    1.49E+11    2792.088    0.190862982 0   0   190.7373159
 MYS_SGP    25406889    2592050 1.94E+11    1.69E+11    315.5433    0.248823886 0   0   29547.92915
 IDN_SGP    104020998   4359943 4.32E+11    1.69E+11    886.1407    0.201927152 0   0   34895.04442
 PHL_THA    51290535    259950903   1.49E+11    2.47E+11    2210.015    0.234834327 0   0   2057.166994
 IDN_THA    15456842    82233669    4.32E+11    2.47E+11    2316.466    0.2314039   0   0   1866.429678
 MYS_THA    25580025    405724623   1.94E+11    2.47E+11    1187.123    0.246323269 0   0   3480.685589
 SGP_THA    109397804   181203225   1.69E+11    2.47E+11    1433.936    0.241136255 0   0   33028.61474
 IDN_VNM    116169  411089  4.32E+11    77414425532 3023.314    0.128828319 0   0   952.108653
 MYS_VNM    78770   5099    1.94E+11    77414425532 2040.94 0.204073979 0   0   6299.22392
 PHL_VNM    12322   442466  1.49E+11    77414425532 1750.016    0.224837139 0   0   761.3713371
 SGP_VNM    12167407    6959737 1.69E+11    77414425532 2207.195    0.215604147 0   0   35847.15307
 THA_VNM    192568  15221723    2.47E+11    77414425532 990.7018    0.181693618 0   0   2818.538331
 IDN_MYS    195446734   38077097    5.10E+11    2.31E+11    1174.196    0.214515503 0   0   6282.103658
 IDN_PHL    2221    1074    5.10E+11    1.74E+11    2792.088    0.18941607  0   0   257.2702704
 IDN_SGP    137780587   131012335   5.10E+11    1.79E+11    886.1407    0.192218812 0   0   34794.08436
 MYS_SGP    29608269    1785983 2.31E+11    1.79E+11    315.5433    0.245966948 0   0   28511.9807
 IDN_THA    83960790    638313022   5.10E+11    2.73E+11    2316.466    0.226956384 0   0   1940.13688
 PHL_THA    93904304    489639916   1.74E+11    2.73E+11    2210.015    0.237698194 0   0   2197.40715
 MYS_THA    27635575    463572856   2.31E+11    2.73E+11    1187.123    0.248294683 0   0   4341.966779
 SGP_THA    91150086    272714486   1.79E+11    2.73E+11    1433.936    0.239243442 0   0   32853.94748
 SGP_VNM    32692201    8777399 1.79E+11    99130304099 2207.195    0.229411821 0   0   35807.78867
 PHL_VNM    1183981 452291  1.74E+11    99130304099 1750.016    0.231359489 0   0   756.4340478
 MYS_VNM    339799  1114755 2.31E+11    99130304099 2040.94 0.210114801 0   0   7295.807976
 THA_VNM    278151  32005   2.73E+11    99130304099 990.7018    0.195565711 0   0   2953.841198
 IDN_VNM    40753   568034  5.10E+11    99130304099 3023.314    0.13621204  0   0   1013.704318

Let us solve the problem in a slightly roundabout way, first making an dummy matrix based on the id's and then running rlm function excluding some columns of that matrix corresponding to some levels.

# create dummy matrix for id

idx <- sort(unique(mydata$id))  
dummy <- matrix(NA, nrow = nrow(mydata), ncol = length(idx))

for (j in 1:length(idx)){
dummy[,j] <- as.integer(mydata$id == idx[j])
            }
dummy <- data.frame(dummy)
names(dummy) <- idx
mydata <- cbind(mydata, dummy)

# run rlm excluding some levels (e.g. levels 14 and 15) of id

model <- as.formula(paste("log(export + import) ~ log(gdp.i*gdp.j) + 
            log(dis)+ log(Sij) + AFC + GFC + I(dpgdp*0.001)", 
            paste(unique(mydata$id)[1:13], collapse = " + "),sep="+"))

result.rlm <- rlm(model, data = mydata)

summary(result.rlm)

Call: rlm(formula = model, data = mydata)
Residuals:
    Min       1Q   Median       3Q      Max 
-9.17356 -1.21478  0.07208  1.25235  5.18840 

Coefficients:
                   Value     Std. Error t value  
(Intercept)        -333.2742   48.8993    -6.8155
log(gdp.i * gdp.j)    1.2882    0.0745    17.2959
log(dis)             37.6566    6.1469     6.1262
log(Sij)              1.4847    0.6695     2.2174
AFC                   0.4229    0.2791     1.5152
GFC                  -0.0674    0.2331    -0.2892
I(dpgdp * 0.001)     -0.0819    0.0137    -5.9591
IDN_MYS              20.1923    3.4593     5.8371
IDN_PHL             -13.8168    1.9475    -7.0948
IDN_SGP              34.1516    5.4185     6.3028
MYS_SGP              72.8390   11.7699     6.1886
IDN_THA              -5.9810    0.8392    -7.1271
MYS_THA              20.6914    3.3979     6.0894
SGP_THA              15.8244    2.4896     6.3563
IDN_VNM             -16.0143    2.4548    -6.5236
THA_VNM              28.0544    4.5140     6.2150
PHL_MYS              -7.0767    1.1393    -6.2115
PHL_SGP              -3.3307    0.7094    -4.6954
PHL_THA              -2.5637    0.5553    -4.6163
PHL_VNM               4.8306    1.0629     4.5445

Residual standard error: 1.827 on 1455 degrees of freedom

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM