I'm working with survey data of a complex sample to estimate binary outcome models. I am trying to report average marginal effects of a logit model, which I estimated through svyglm
of the survey package in R. However, I get the following error when I use margins
from the package of the same name:
margins(fit, design = lapop) %>% summary()
Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'summary': arguments imply differing number of rows: 6068, 6054
Seems it is not the summary
function, since the error pops up when executing the margins command with its arguments. I have tried to simply ignore the survey weights at all and shows me equal coefficients and AMEs but not standard errors. Obviously, I cannot present this work by ignoring the survey weights. So I guess what I really need is the standard errors.
I have been reading on the topic and have found no clear solution, I suspect it might have something to do with missing values of the X in the model, but as with any other linear model, R should be just working with complete cases.
I'm not sure if anybody knows anything about this, or if I should simply just report AMEs without standard errors (and thus without p-values). I have uploaded a MWE if anyone is interested, which can be found here .
The error message indicates that there are NA
s in some rows that R does not automatically exclude. First, I tried to reproduce the error message using both fit
and lapop
variables, and the error did pop up.
margins(fit, design = lapop)
#Error in data.frame(..., check.rows = FALSE, check.names = FALSE, fix.empty.names = FALSE, :
# arguments imply differing number of rows: 6068, 6054
Then, I tried to confirm which variable has the problematic NA
s.
margins(fit)
#Note: Estimating marginal effects without survey weights. Specify 'design' to adjust for weighting.
#Error in data.frame(..., check.rows = FALSE, check.names = FALSE, fix.empty.names = FALSE, :
# arguments imply differing number of rows: 6068, 6054
The same error message popped up, so I believe fit
contains the NA
s. Then I checked how fit
is produced in your code:
fit<-svyglm(ctol ~ y16 + age,
design = lapop,
family = quasibinomial(link = 'logit'))
The NA
s should have been in any of ctol
, y16
, or age
columns. Then, I found NA
s in age
> str(df46$age)
dbl+lbl [1:3034] 30, 62, 25, 38, 24, 76, 39, 16, 71, 62, 29, 27, 60, 41, 22, 20, NA, 5...
@ labels: Named num [1:4] NA 888 988 0
..- attr(*, "names")= chr [1:4] "Don't Know" "ns" "nr" "No sabe/No responde"
@ label : chr "Age"
Then, I checked how many NA
s are there in age
column and where they are located.
which(is.na(df46$age))
[1] 17 28 802 888 1045 2401 2898
There are 7 NA
s. I suspect this number relates to the numbers in the error message because there are 3034 rows in df46
. Double the number, you get 6068. Double the number of NA
s, you get 14, and 6068- 14 = 6054, the exact number shown in the error message.
Then, I tried to exclude the seven rows in df46
to get the complete cases, and create the lapop
and fit
with the complete cases.
ind = which(is.na(df46$age))
df46_complete = df46[-ind,]
lapop<-svydesign(ids = ~ upm,
strata = ~ estratopri,
weights = ~ weight1500,
nest = T,
data = df46_complete)
fit<-svyglm(ctol ~ y16 + age,
design = lapop,
family = quasibinomial(link = 'logit'))
Finally, no error pops up when I run margins()
:
margins(fit, design = lapop) %>% summary()
# factor AME SE z p lower upper
# age -0.0026 0.0004 -6.0633 0.0000 -0.0035 -0.0018
# y16 0.1323 0.0187 7.0638 0.0000 0.0962 0.1696
I found out what was happening with this: turns out it's some kind of mistake in the package's code. You can take a look at the specifics here . The solution as of today is to install Tomasz Żółtak's forks of the prediction and margins packages until his pull requests on Github are merged.
devtools::install_github("tzoltak/prediction")
devtools::install_github("tzoltak/margins")
This must be done after installing the devtools package if you haven't already.
install.packages('devtools')
After doing this, running margins()
on a model should no longer produce errors if the model's dataframe has missing values on some or all of the model's covariates. Thus, it will present average partial effects with their corresponding survey-weighted standard errors. Check out a MWE here .
Hopefully in the future calling margins directly from CRAN will be enough to not produce this error with survey-weighted models.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.