简体   繁体   中英

How to replicate SUDAAN 75th percentile and 95% confidence intervals by age groups in R's 'survey' package?

I'm trying to replicate quantile estimates with 95% confidence intervals by age groups from SAS and SUDAAN in the 'survey' package in R with NHANES data. The package's 'svyby' function combined with its 'svyquantile' function allow you to perform this analysis quite easily; my results are close but not exactly the same as the results generated by SUDAAN.

I believe this may be due to a number of arguments the 'svyby' and 'svyquantile' functions allow you customize. The arguments the 'svyquantile' function takes include 'method', 'interval.type', 'ties, 'interval.type', 'return.replicates', etc.

I've found an this article which explains how to replicate some SUDAAN functions with the 'survey' package, but does not explain how to replicate quantile estimates. Through some research on how SUDAAN estimates quantiles, I believe the 'method' argument should be set to 'linear'. Besides that, I've tried setting the various arguments to different parameters, but have not had luck replicating the SUDAAN estimates exactly.

Does anyone know how to replicate SUDAAN quantile estimates and 95% confidence intervals by groups, or have any documentation on the methodology SUDAAN uses in order to better replicate this analysis using the 'survey' package in R?

In the code below, I've shown my approach. The results of the 'svyby' function seem like reasonable estimates, however, they are not identical to the results produced by SUDAAN and SAS. I don't have access to SUDAAN and SAS, but my objective is to replicate their results in R. Specifically, the 75th percentile for the 60+ age group according to SUDAAN and SAS for PCB 118 is 25.89 ng/g lipid (95% CI: 22.97-30.17). Thank you.

library(RNHANES)
library(survey)

# import NHANES 2003-2004 PCB Dataset 
pcbs <- nhanes_load_data("L28DFP_C", "2003-2004", demographics = T)

# create appropriate age groups
pcbs$age <- ifelse(pcbs$RIDAGEYR < 20, "<20",
            ifelse(pcbs$RIDAGEYR >= 20 & pcbs$RIDAGEYR <= 39, "20-39",
            ifelse(pcbs$RIDAGEYR >= 40 & pcbs$RIDAGEYR <= 59, "40-59",
            ifelse(pcbs$RIDAGEYR >= 60, "60+", ""))))
pcbs$age <- as.factor(pcbs$age)
levels(pcbs$age) = c("<20", "20-39", "40-59", "60+")

# assign survey design
nhanes.dsgn <- svydesign(id = ~SDMVPSU, strata = ~SDMVSTRA , weights = ~ WTSC2YR, data = pcbs, nest = TRUE)

# quantiles for subpopulations
svyby(~LBX118LA, ~age, nhanes.dsgn, svyquantile, quantiles=0.75, ci=TRUE, alpha=0.05,vartype="ci", na.rm=T, method = "linear")

From the documentation on the 'survey' package: "Combining interval.type="betaWald" and ties="discrete" is (close to) the proposal of Shah and Vaish(2006) used in some versions of SUDAAN.”

So,

PCB118LA <- svyby(~LBX118LA, ~age, nhanes.dsgn, svyquantile, quantiles = 0.75, ci=TRUE, alpha=0.05, vartype="ci", na.rm=T, method = "linear", ties = "discrete", interval.type="betaWald")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM