简体   繁体   中英

Reporting average marginal effects of a survey-weighted logit model with R

I'm working with survey data of a complex sample to estimate binary outcome models. I am trying to report average marginal effects of a logit model, which I estimated through svyglm of the survey package in R. However, I get the following error when I use margins from the package of the same name:

margins(fit, design = lapop) %>% summary()

Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'summary': arguments imply differing number of rows: 6068, 6054

Seems it is not the summary function, since the error pops up when executing the margins command with its arguments. I have tried to simply ignore the survey weights at all and shows me equal coefficients and AMEs but not standard errors. Obviously, I cannot present this work by ignoring the survey weights. So I guess what I really need is the standard errors.

I have been reading on the topic and have found no clear solution, I suspect it might have something to do with missing values of the X in the model, but as with any other linear model, R should be just working with complete cases.

I'm not sure if anybody knows anything about this, or if I should simply just report AMEs without standard errors (and thus without p-values). I have uploaded a MWE if anyone is interested, which can be found here .

The error message indicates that there are NA s in some rows that R does not automatically exclude. First, I tried to reproduce the error message using both fit and lapop variables, and the error did pop up.

margins(fit, design = lapop)

#Error in data.frame(..., check.rows = FALSE, check.names = FALSE, fix.empty.names = FALSE,  : 
# arguments imply differing number of rows: 6068, 6054

Then, I tried to confirm which variable has the problematic NA s.

margins(fit)

#Note: Estimating marginal effects without survey weights. Specify 'design' to adjust for weighting.
#Error in data.frame(..., check.rows = FALSE, check.names = FALSE, fix.empty.names = FALSE,  : 
# arguments imply differing number of rows: 6068, 6054

The same error message popped up, so I believe fit contains the NA s. Then I checked how fit is produced in your code:

fit<-svyglm(ctol ~ y16 + age,
            design = lapop,
            family = quasibinomial(link = 'logit'))

The NA s should have been in any of ctol , y16 , or age columns. Then, I found NA s in age

> str(df46$age)

 dbl+lbl [1:3034] 30, 62, 25, 38, 24, 76, 39, 16, 71, 62, 29, 27, 60, 41, 22, 20, NA, 5...
 @ labels: Named num [1:4] NA 888 988 0
  ..- attr(*, "names")= chr [1:4] "Don't Know" "ns" "nr" "No sabe/No responde"
 @ label : chr "Age"

Then, I checked how many NA s are there in age column and where they are located.

which(is.na(df46$age))

[1]   17   28  802  888 1045 2401 2898

There are 7 NA s. I suspect this number relates to the numbers in the error message because there are 3034 rows in df46 . Double the number, you get 6068. Double the number of NA s, you get 14, and 6068- 14 = 6054, the exact number shown in the error message.

Then, I tried to exclude the seven rows in df46 to get the complete cases, and create the lapop and fit with the complete cases.

ind = which(is.na(df46$age))
df46_complete = df46[-ind,]
lapop<-svydesign(ids = ~ upm, 
                 strata = ~ estratopri, 
                 weights = ~ weight1500, 
                 nest = T,
                 data = df46_complete)
fit<-svyglm(ctol ~ y16 + age,
            design = lapop,
            family = quasibinomial(link = 'logit'))

Finally, no error pops up when I run margins() :

margins(fit, design = lapop) %>% summary()

# factor     AME     SE       z      p   lower   upper
#    age -0.0026 0.0004 -6.0633 0.0000 -0.0035 -0.0018
#    y16  0.1323 0.0187  7.0638 0.0000  0.0962  0.1696

I found out what was happening with this: turns out it's some kind of mistake in the package's code. You can take a look at the specifics here . The solution as of today is to install Tomasz Żółtak's forks of the prediction and margins packages until his pull requests on Github are merged.

devtools::install_github("tzoltak/prediction")
devtools::install_github("tzoltak/margins")

This must be done after installing the devtools package if you haven't already.

install.packages('devtools')

After doing this, running margins() on a model should no longer produce errors if the model's dataframe has missing values on some or all of the model's covariates. Thus, it will present average partial effects with their corresponding survey-weighted standard errors. Check out a MWE here .

Hopefully in the future calling margins directly from CRAN will be enough to not produce this error with survey-weighted models.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM