简体   繁体   English

原始数据在来自小鼠的汇总评估中的作用(R包)?

[英]Role of raw data in pooled estimates from mice (R package)?

I'm wondering what is the role of the original data set when using the mice package in R for imputed data. 我想知道当使用R中的mouses包获取估算数据时,原始数据集的作用是什么。 I need to impute my data and then compute some additional variables before turning the long data set back into an as.mids object. 我需要先估算数据,然后计算一些其他变量,然后再将长数据集转回as.mids对象。 I noticed that when computing my additional variable ("total" in the code below) that whether I used na.rm=TRUE affected by estimates and from my understanding, it shouldn't. 我注意到,在计算我的附加变量(以下代码中的“总计”)时,我是否使用na.rm=TRUE受到估计的影响,并且根据我的理解,应该不会。 Here's a reproducible example: 这是一个可重现的示例:

# Add required package 
 require(mice)

# Impute data and compute summary with na.rm=T 
 imp1 <- mice(nhanes, seed = 123) 
 com1 <- complete(imp1, "long", include = TRUE) 
 head(com1) 
 com1$total <- rowSums(com1[4:6],na.rm=T)
 imp2 <- as.mids(com1)

# Fit model with data using na.rm=T 
 fit <- with(imp2, lm(bmi ~ age)) 
 round(summary(pool(fit)), 2)

Notice that my variable "total" is the rowSums of 3 variables and I've used na.rm=TRUE . 注意,我的变量“ total”是3个变量的rowSums,并且我使用了na.rm=TRUE However, as only the original data set (denoted by the variable ".imp" in the long data set contains NA values, this extra bit of code should only be relevant for the original data. Removing na.rm=TRUE shows that this is not true: 但是,由于只有原始数据集(由长数据集中的变量“ .imp”表示包含NA值),所以此额外的代码位仅应与原始数据相关。删除na.rm=TRUE表示这是不对:

# Impute data and compute summary without na.rm=T 
 imp3 <- mice(nhanes, seed = 123) 
 com2 <- complete(imp3, "long", include = TRUE) 
 head(com2) 
 com2$total <- rowSums(com2[4:6]) 
 imp4 <- as.mids(com2)

# Fit model with data without using na.rm=T 
fit2 <- with(imp4, lm(bmi ~ age)) 
round(summary(pool(fit2)), 2)

Again, notice that leaving out na.rm=TRUE leads to different estimates. 同样,请注意, na.rm=TRUE会导致不同的估计。 The only difference here is that the variable "total" now has NA values when the variable .imp is equal to zero (ie, the original data set). 唯一的区别是,当变量.imp等于零(即原始数据集)时,变量“总计”现在具有NA值。

What am I missing? 我想念什么? I would have thought that only the imputed data would have affected the pooled estimates, while I just showed that values in the original data set do (ie, those from .imp = 0). 我本以为只有估算的数据会影响合并的估计,而我只是表明原始数据集中的值确实会影响(即来自.imp = 0的值)。 What is the role of the original data set in getting pooled estimates from mice? 原始数据集在从小鼠收集汇总估计值中起什么作用?

NOTE: EDITED FOR CLARITY 注意:为清晰起见而编辑

I would imagine that the original (raw) data plays no role. 我可以想象原始(原始)数据不起作用。 According to the as.mids help page it is only needed to signify where the missing data is. 根据as.mids帮助页面,仅需要as.mids丢失的数据在哪里。 I ran your script and noticed there was an error when creating imp2 . 我运行了您的脚本,发现创建imp2时出现错误。 You call on the object com which should be com1 . 您调用对象com ,该对象应为com1 After correction get the exact same results for the two approaches: 校正后,两种方法可获得完全相同的结果:

# Add required package 
require(mice)

# Impute data and compute summary with na.rm=T 
imp1 <- mice(nhanes, seed = 123) 
com1 <- complete(imp1, "long", include = TRUE) 
head(com1) 
com1$total <- rowSums(com1[4:6],na.rm=T)
imp2 <- as.mids(com1)

# Fit model with data using na.rm=T 
fit <- with(imp2, lm(bmi ~ age)) 

# Impute data and compute summary without na.rm=T 
imp3 <- mice(nhanes, seed = 123) 
com2 <- complete(imp3, "long", include = TRUE) 
head(com2) 
com2$total <- rowSums(com2[4:6]) 
imp4 <- as.mids(com2)

# Fit model with data without using na.rm=T 
fit2 <- with(imp4, lm(bmi ~ age)) 

The results: 结果:

> round(summary(pool(fit)), 2)
            estimate std.error statistic    df p.value
(Intercept)    29.76      1.86     15.98 18.61    0.00
age            -1.73      0.95     -1.83 19.50    0.08

> round(summary(pool(fit2)), 2)
            estimate std.error statistic    df p.value
(Intercept)    29.76      1.86     15.98 18.61    0.00
age            -1.73      0.95     -1.83 19.50    0.08

In short I think the different results may be due to an error in your code. 简而言之,我认为不同的结果可能是由于您的代码错误所致。 I used mice 3.0.9 我用的是mice 3.0.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM