I have data such as this, I am trying to use the survey package to apply weights and find the means, SE and the N from each variable.
I was able to find the mean and SE, but I don't know how to pull the N for each variable.
library(survey)
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
vector_of_variables <- c( 'api00' , 'api99' )
result <-
lapply(
vector_of_variables ,
function( w ) svymean( as.formula( paste( "~" , w ) ) , dclus1 , na.rm = TRUE )
)
result <- lapply( result , function( v ) data.frame( variable = names( v ) , mean = coef( v ) , se = as.numeric( SE( v ) ) ) )
do.call( rbind , result )
Any suggestions?
I've adapted the answer given below to expand my question:
library(survey)
data(api)
apiclus1 <-
apiclus1 %>%
mutate(pw2 = pw*0.8) %>%
mutate(part = case_when(full<80 ~"part 1", TRUE~"part 2"))
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
dclus2 <- svydesign(id=~dnum, weights=~pw2, data=apiclus1, fpc=~fpc)
meanseN<-function(variable,design, part,shc.wide){
formula<-make.formula(variable)
m <-svymean(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
N<-unwtd.count(formula, subset(design, part==part, shc.wide = shc.wide),na.rm=TRUE)
c(mean=coef(m), se=SE(m), N=coef(N))
}
vector_of_variables <- c("acs.k3","api00")
sapply(vector_of_variables, meanseN, "part 1","No",design=dclus1)
acs.k3 api00
mean.acs.k3 20.0347222 644.16940
se 0.5204887 23.54224
N.counts 144.0000000 183.00000
As you can see I subset the data (dclus1), so the N's I expect to see for each design should be:
table(apiclus1$sch.wide, apiclus1$part)
part 1 part 2
No 4 19
Yes 30 130
unwtd.count is returning the count for the full sample of data, instead of the subset.... Any idea's why this might be happening?
You don't actually need the survey package functions to do this. The number of observations is whatever it is, it's not a population estimate based on the design. However, the pacakage does have the function unwtd.count
to get unweighted count of non-missing observations, eg
> unwtd.count(~api00, dclus1)
counts SE
counts 183 0
If you want all three things in a loop like you were doing before, then rather than doing it in one line it's easiest to write a little function
meanseN<-function(variable,design){
formula<-make.formula(variable)
m <-svymean(formula, design,na.rm=TRUE)
N<-unwtd.count(formula, design)
c(mean=coef(m), se=SE(m), N=coef(N))
}
and do something like
> sapply(vector_of_variables, meanseN,design=dclus1)
api00 api99
mean.api00 644.16940 606.97814
se 23.54224 24.22504
N.counts 183.00000 183.00000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.