[英]How to replicate SUDAAN 75th percentile and 95% confidence intervals by age groups in R's 'survey' package?
I'm trying to replicate quantile estimates with 95% confidence intervals by age groups from SAS and SUDAAN in the 'survey' package in R with NHANES data. 我正在尝试使用NHANES数据在R的“调查”数据包中按年龄组从SAS和SUDAAN复制具有95%置信区间的分位数估计。 The package's 'svyby' function combined with its 'svyquantile' function allow you to perform this analysis quite easily;
软件包的“ svyby”功能与其“ svyquantile”功能相结合,使您可以轻松地执行此分析。 my results are close but not exactly the same as the results generated by SUDAAN.
我的结果接近但与SUDAAN产生的结果不完全相同。
I believe this may be due to a number of arguments the 'svyby' and 'svyquantile' functions allow you customize. 我相信这可能是由于“ svyby”和“ svyquantile”功能允许您自定义的许多参数所致。 The arguments the 'svyquantile' function takes include 'method', 'interval.type', 'ties, 'interval.type', 'return.replicates', etc.
“ svyquantile”函数采用的参数包括“方法”,“ interval.type”,“关系”,“ interval.type”,“ return.replicates”等。
I've found an this article which explains how to replicate some SUDAAN functions with the 'survey' package, but does not explain how to replicate quantile estimates. 我发现这篇文章解释了如何使用“调查”包复制某些SUDAAN函数,但没有解释如何复制分位数估计。 Through some research on how SUDAAN estimates quantiles, I believe the 'method' argument should be set to 'linear'.
通过对SUDAAN如何估计分位数的一些研究,我认为应将“方法”论点设置为“线性”。 Besides that, I've tried setting the various arguments to different parameters, but have not had luck replicating the SUDAAN estimates exactly.
除此之外,我尝试将各种参数设置为不同的参数,但是还没有运气准确地复制SUDAAN估算值。
Does anyone know how to replicate SUDAAN quantile estimates and 95% confidence intervals by groups, or have any documentation on the methodology SUDAAN uses in order to better replicate this analysis using the 'survey' package in R? 有谁知道如何按组复制SUDAAN分位数估算值和95%的置信区间,或者是否有任何有关SUDAAN使用的方法的文档,以便使用R中的“调查”包更好地复制此分析?
In the code below, I've shown my approach. 在下面的代码中,我展示了我的方法。 The results of the 'svyby' function seem like reasonable estimates, however, they are not identical to the results produced by SUDAAN and SAS.
“ svyby”函数的结果似乎是合理的估计,但是,它们与SUDAAN和SAS产生的结果并不相同。 I don't have access to SUDAAN and SAS, but my objective is to replicate their results in R. Specifically, the 75th percentile for the 60+ age group according to SUDAAN and SAS for PCB 118 is 25.89 ng/g lipid (95% CI: 22.97-30.17).
我无法使用SUDAAN和SAS,但我的目标是在R中复制他们的结果。具体而言,根据SUDAAN和SAS对于PCB 118的60岁以上年龄组的第75个百分位数是25.89 ng / g脂质(95% CI:22.97-30.17)。 Thank you.
谢谢。
library(RNHANES)
library(survey)
# import NHANES 2003-2004 PCB Dataset
pcbs <- nhanes_load_data("L28DFP_C", "2003-2004", demographics = T)
# create appropriate age groups
pcbs$age <- ifelse(pcbs$RIDAGEYR < 20, "<20",
ifelse(pcbs$RIDAGEYR >= 20 & pcbs$RIDAGEYR <= 39, "20-39",
ifelse(pcbs$RIDAGEYR >= 40 & pcbs$RIDAGEYR <= 59, "40-59",
ifelse(pcbs$RIDAGEYR >= 60, "60+", ""))))
pcbs$age <- as.factor(pcbs$age)
levels(pcbs$age) = c("<20", "20-39", "40-59", "60+")
# assign survey design
nhanes.dsgn <- svydesign(id = ~SDMVPSU, strata = ~SDMVSTRA , weights = ~ WTSC2YR, data = pcbs, nest = TRUE)
# quantiles for subpopulations
svyby(~LBX118LA, ~age, nhanes.dsgn, svyquantile, quantiles=0.75, ci=TRUE, alpha=0.05,vartype="ci", na.rm=T, method = "linear")
From the documentation on the 'survey' package: "Combining interval.type="betaWald" and ties="discrete" is (close to) the proposal of Shah and Vaish(2006) used in some versions of SUDAAN.” 从“调查”包的文档中可以得出:“ Combining interval.type =“ betaWald”和ties =“ discrete”(接近)Shah和Vaish(2006)在某些版本的SUDAAN中使用的建议。”
So, 所以,
PCB118LA <- svyby(~LBX118LA, ~age, nhanes.dsgn, svyquantile, quantiles = 0.75, ci=TRUE, alpha=0.05, vartype="ci", na.rm=T, method = "linear", ties = "discrete", interval.type="betaWald")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.