简体   繁体   中英

Outputting the N's using the survey package (svymean)

I have data such as this, I am trying to use the survey package to apply weights and find the means, SE and the N from each variable.

I was able to find the mean and SE, but I don't know how to pull the N for each variable.

library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
vector_of_variables <- c( 'api00' , 'api99' )
result <- 
    lapply( 
        vector_of_variables , 
        function( w ) svymean( as.formula( paste( "~" , w ) ) , dclus1 , na.rm = TRUE ) 
    )

result <- lapply( result , function( v ) data.frame( variable = names( v ) , mean = coef( v ) , se = as.numeric( SE( v ) ) ) )

do.call( rbind , result )

Any suggestions?


EDIT

I've adapted the answer given below to expand my question:

library(survey)
data(api)
apiclus1 <- 
  apiclus1 %>% 
  mutate(pw2 = pw*0.8) %>%
  mutate(part = case_when(full<80 ~"part 1", TRUE~"part 2"))

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

dclus2 <- svydesign(id=~dnum, weights=~pw2, data=apiclus1, fpc=~fpc)

meanseN<-function(variable,design, part,shc.wide){
  formula<-make.formula(variable)
  m <-svymean(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
  N<-unwtd.count(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
  c(mean=coef(m), se=SE(m), N=coef(N))
}

vector_of_variables <- c("acs.k3","api00")
 


sapply(vector_of_variables, meanseN, "part 1","No",design=dclus1)

                     acs.k3     api00
mean.acs.k3  20.0347222 644.16940
se            0.5204887  23.54224
N.counts    144.0000000 183.00000

As you can see I subset the data (dclus1), so the N's I expect to see for each design should be:

table(apiclus1$sch.wide, apiclus1$part)

      part 1 part 2
  No       4     19
  Yes     30    130

unwtd.count is returning the count for the full sample of data, instead of the subset.... Any idea's why this might be happening?

You don't actually need the survey package functions to do this. The number of observations is whatever it is, it's not a population estimate based on the design. However, the pacakage does have the function unwtd.count to get unweighted count of non-missing observations, eg

> unwtd.count(~api00, dclus1)
       counts SE
counts    183  0

If you want all three things in a loop like you were doing before, then rather than doing it in one line it's easiest to write a little function

meanseN<-function(variable,design){
    formula<-make.formula(variable)
    m <-svymean(formula, design,na.rm=TRUE)
    N<-unwtd.count(formula, design)
    c(mean=coef(m), se=SE(m), N=coef(N))
}

and do something like

> sapply(vector_of_variables, meanseN,design=dclus1)
               api00     api99
mean.api00 644.16940 606.97814
se          23.54224  24.22504
N.counts   183.00000 183.00000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM