简体   繁体   English

如何从使用R包MICE运行的模型中获取拟合值

[英]How to get the fitted values from a model run with R package MICE

I have written a GLMM with eight fixed effects and two random effects. 我写了一个有八个固定效果和两个随机效果的GLMM。 Two of my fixed effects contained missing data, so I used the R package MICE to impute the missing values. 我的两个固定效果包含缺失的数据,因此我使用R包MICE来估算缺失的值。

I want to create a graph with the fitted values from my model and the actual observed values. 我想用我的模型的拟合值和实际观察值创建一个图形。 If I didn't have missing data and had used the package lme4 to run my model, I would simply use the function fitted() to get the model's fitted values. 如果我没有丢失的数据,并且使用包lme4来运行我的模型,则只需使用fit()函数来获取模型的拟合值。 However, because I used MICE, I am unsure how to get the fitted values for my model. 但是,由于使用了MICE,因此不确定如何获取模型的拟合值。 When I use the function fitted(), it returns “NULL” instead of a vector of the fitted values. 当我使用函数fit()时,它返回“ NULL”,而不是拟合值的向量。

I've been scouring the internet to try and find an example where someone else has gotten a vector of the fitted values after using MICE to impute missing data and run a GLMM, but I have not been able to find anything... 我一直在互联网上搜寻尝试找到一个示例,在该示例中,其他人使用MICE插补丢失的数据并运行GLMM后,已经获得了拟合值的向量,但是我却找不到任何东西...

Does anyone know of the function or a way to calculate the fitted values from my model that was run using the MICE package? 有人知道使用MICE软件包运行的模型中的函数或计算拟合值的方法吗? Or possibly recommend another resource that could help? 还是可能推荐其他可以帮助您的资源?

Many thanks in advance, Olivia 预先感谢,奥利维亚

Without a working example it is hard to figure out what exactly you run into. 没有有效的示例,很难弄清楚您到底遇到了什么。

Nevertheless here is an example using mice and lme4 packages fitting a (nonsense) model with lmer . 不过这里是使用例如micelme4包安装了(废话)模型lmer

require(mice)
require(lme4)

dt <- mice(nhanes2, seed = 314)

mod <- with(dt, lme4::lmer(bmi ~ chl + (1 | hyp)))

summary(pool(mod))

gives: 给出:

Class: mipo    m = 5 
               estimate         ubar            b            t dfcom       df       riv    lambda       fmi
(Intercept) 21.74573850 14.857649384 2.799244e+00 1.821674e+01    21 13.85172 0.2260850 0.1843959 0.2811936
chl          0.02574043  0.000379629 7.879048e-05 4.741775e-04    21 13.36442 0.2490552 0.1993949 0.2972420

Fit the model in a list column and extract the fitted values for each imputation set. 在列表列中拟合模型并提取每个插补集的拟合值。 Then take the mean of the fitted values, as a way to pool these fitted values. 然后将拟合值的平均值作为合并这些拟合值的一种方法。 I am not sure if this is the recommended way of pooling fitted values, though. 我不确定这是否是合并拟合值的推荐方法。 See also: https://github.com/stefvanbuuren/mice/issues/82 for some expert advice. 另请参阅: https : //github.com/stefvanbuuren/mice/issues/82以获取一些专家建议。

dt %>%
  mice::complete(action = "long", include = FALSE) %>%
  group_by(.imp) %>%
  nest(.key = dt) %>%
  mutate(mod = map(dt, ~ lmer(formula =  bmi ~ chl + (1 | hyp), data = .x))) %>%
  mutate(fitted = map(mod, ~ data.frame(fitted = fitted(.x), id = seq_along(fitted(.x))))) %>%
  select(.imp, fitted) %>%
  unnest() %>%
  group_by(id) %>%
  summarise(fitted = mean(fitted))

gives: 给出:

# A tibble: 25 x 2
      id fitted
   <int>  <dbl>
 1     1   27.1
 2     2   26.6
 3     3   26.6
 4     4   27.1
 5     5   24.7
 6     6   26.5
 7     7   24.8
 8     8   26.6
 9     9   27.9
10    10   27.0
# ... with 15 more rows

The uncertainity of the MICE-imputation method is expressed by multiple produced imputation data sets. MICE输入法的不确定性由多个产生的估算数据集表示。 Calculations as regressions need to be pooled with "Rubin's rule". 回归计算需要与“鲁宾法则”合并。 AFAIK just simple lm or glm methods are implemented into the mice::pool(.) method. AFAIK仅将简单的lmglm方法实现到mice::pool(.)方法中。 You probably need to write some code yourself to pool eg the calculations of random effects as you probably do with lme4 . 您可能需要自己编写一些代码来lme4例如,与lme4一样,对随机效果进行计算。 You may find the needed formulae in Rubin, Donald B. Multiple Imputation for Nonresponse in Surveys. 您可以在鲁宾(Donald B)的《鲁宾(Rubin)》中找到所需的公式 Wiley Series in Probability and Mathematical Statistics. Wiley系列概率论和数理统计。 New York: Wiley, 1987 on page 76 . 纽约:Wiley,1987年 ,第76页。

However, if your model is not that complicated you could combine the fitted values of the different imputations in one plot and separate them with colors. 但是,如果模型不是那么复杂,则可以在一个绘图中组合不同插补的拟合值,然后用颜色将它们分开。

Example 1 例子1

library(mice)
iris.mice <- complete(mice(iris.mis), "long")
with(iris.mice, plot(Petal.Length, lm(Petal.Length ~ Sepal.Width + Petal.Width)$fitted,
                     type="n", xlab="imp.actual", ylab="imp.yhat", main="Petal.Length"))
by(iris.mice, iris.mice$.imp, function(x) {
  with(x, points(Petal.Length, lm(Petal.Length ~ Sepal.Width + Petal.Width, x)$fitted,
                 col=x$.imp))
})
legend("bottomright", legend=unique(iris.mice$.imp), pch=1, col=unique(iris.mice$.imp),
       ncol=3, title="Imp.")

在此处输入图片说明

Another possibility would be to use a different imputation method, eg, MissForest, which produces just one imputed data set with an error margin. 另一种可能性是使用另一种插补方法,例如MissForest,它仅产生一个带误差余量的插补数据集。 It probably would be much easier to do calculations of more complicated models w/o needing to pool everything, depends on what you need, though. 但是,更复杂的模型的计算可能会容易得多,而无需合并所有内容,这取决于您的需求。 You could report the error margin in the plot as text. 您可以将图中的误差幅度报告为文本。

Example 2 例子2

library(missForest)
iris.imp <- missForest(iris.mis, xtrue=iris)
with(iris.imp$ximp, plot(Petal.Length, 
                         lm(Petal.Length ~ Sepal.Width + Petal.Width)$fitted,
                         xlab="imp.actual", ylab="imp.yhat", main="Petal.Length"))
text(5.5, 1.7, paste("NRMSE=", round(iris.imp$error[1], 2)))

在此处输入图片说明

Data 数据

iris.mis <- structure(list(Sepal.Length = c(5.1, NA, 4.7, 4.6, NA, 5.4, 4.6, 
5, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 
5.4, 5.1, 4.6, NA, 4.8, 5, 5, 5.2, 5.2, 4.7, 4.8, 5.4, NA, NA, 
4.9, NA, NA, 4.9, 4.4, 5.1, 5, 4.5, 4.4, 5, 5.1, 4.8, 5.1, 4.6, 
5.3, 5, 7, 6.4, 6.9, 5.5, NA, 5.7, 6.3, 4.9, 6.6, 5.2, 5, 5.9, 
6, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, NA, 6.1, 6.4, 
6.6, 6.8, 6.7, 6, 5.7, 5.5, 5.5, 5.8, NA, 5.4, 6, 6.7, NA, 5.6, 
5.5, 5.5, 6.1, 5.8, NA, 5.6, 5.7, NA, 6.2, 5.1, NA, 6.3, NA, 
7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, NA, 6.5, NA, 6.8, 5.7, 5.8, 
6.4, 6.5, 7.7, 7.7, 6, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 
NA, 7.2, NA, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, NA, NA, 6.9, 6.7, 
6.9, 5.8, 6.8, 6.7, NA, NA, 6.5, 6.2, NA), Sepal.Width = c(3.5, 
3, 3.2, NA, 3.6, 3.9, 3.4, 3.4, NA, 3.1, 3.7, NA, NA, 3, NA, 
4.4, 3.9, 3.5, 3.8, 3.8, 3.4, NA, 3.6, 3.3, 3.4, 3, 3.4, 3.5, 
3.4, 3.2, 3.1, NA, 4.1, 4.2, 3.1, NA, 3.5, 3.6, 3, NA, 3.5, 2.3, 
3.2, NA, 3.8, NA, 3.8, NA, 3.7, 3.3, 3.2, NA, NA, 2.3, NA, 2.8, 
3.3, NA, 2.9, 2.7, 2, 3, 2.2, 2.9, 2.9, 3.1, 3, NA, 2.2, 2.5, 
3.2, NA, NA, 2.8, 2.9, 3, NA, NA, 2.9, 2.6, 2.4, 2.4, NA, 2.7, 
3, 3.4, 3.1, 2.3, 3, 2.5, NA, NA, 2.6, 2.3, 2.7, NA, 2.9, 2.9, 
2.5, 2.8, 3.3, 2.7, 3, 2.9, 3, 3, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 
3, 2.5, NA, NA, 3, 3.8, 2.6, NA, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 
2.8, 3, 2.8, 3, 2.8, 3.8, NA, 2.8, 2.6, NA, 3.4, 3.1, 3, 3.1, 
3.1, 3.1, 2.7, 3.2, 3.3, 3, NA, 3, NA, 3), Petal.Length = c(NA, 
1.4, NA, NA, 1.4, 1.7, 1.4, 1.5, NA, 1.5, 1.5, 1.6, NA, 1.1, 
1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1, 1.7, NA, 1.6, NA, 
1.5, NA, 1.6, 1.6, NA, 1.5, 1.4, 1.5, NA, NA, NA, 1.3, 1.5, 1.3, 
1.3, NA, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4, NA, 4.5, 4.9, 4, 
4.6, 4.5, NA, 3.3, 4.6, 3.9, NA, NA, 4, 4.7, 3.6, NA, 4.5, 4.1, 
4.5, 3.9, 4.8, 4, NA, NA, 4.3, 4.4, 4.8, 5, 4.5, 3.5, NA, 3.7, 
3.9, 5.1, NA, 4.5, NA, 4.4, 4.1, 4, 4.4, 4.6, NA, 3.3, 4.2, 4.2, 
4.2, 4.3, NA, NA, 6, 5.1, 5.9, NA, 5.8, 6.6, 4.5, 6.3, NA, 6.1, 
5.1, NA, 5.5, 5, 5.1, 5.3, 5.5, 6.7, 6.9, 5, 5.7, 4.9, 6.7, 4.9, 
5.7, 6, 4.8, 4.9, 5.6, 5.8, NA, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 
5.5, 4.8, NA, NA, 5.1, NA, 5.9, 5.7, 5.2, 5, 5.2, 5.4, 5.1), 
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, NA, 0.3, 0.2, NA, 
    0.1, 0.2, NA, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 
    0.4, 0.2, NA, 0.2, NA, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, NA, 
    0.2, 0.2, NA, 0.2, 0.1, 0.2, NA, 0.3, 0.3, 0.2, 0.6, 0.4, 
    0.3, 0.2, 0.2, 0.2, 0.2, NA, 1.5, 1.5, NA, 1.5, 1.3, NA, 
    1, 1.3, 1.4, 1, 1.5, 1, NA, 1.3, 1.4, 1.5, 1, 1.5, 1.1, NA, 
    NA, 1.5, 1.2, 1.3, 1.4, 1.4, NA, 1.5, NA, NA, 1, 1.2, NA, 
    1.5, 1.6, 1.5, 1.3, 1.3, NA, 1.2, NA, NA, 1, 1.3, 1.2, 1.3, 
    1.3, 1.1, 1.3, 2.5, NA, 2.1, 1.8, 2.2, 2.1, 1.7, NA, 1.8, 
    NA, 2, 1.9, 2.1, 2, 2.4, 2.3, NA, NA, 2.3, 1.5, 2.3, 2, 2, 
    1.8, NA, 1.8, 1.8, NA, 2.1, NA, 1.9, 2, 2.2, 1.5, 1.4, 2.3, 
    2.4, 1.8, 1.8, 2.1, 2.4, 2.3, NA, 2.3, 2.5, 2.3, 1.9, NA, 
    NA, 1.8), Species = structure(c(1L, 1L, NA, 1L, NA, 1L, NA, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L, NA, 1L, 
    1L, 1L, 1L, NA, 1L, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, NA, 
    2L, 2L, NA, 2L, NA, 2L, 2L, 2L, 2L, 2L, 2L, NA, NA, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, NA, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, NA, 2L, 2L, 2L, NA, NA, 
    NA, 2L, 2L, 3L, NA, NA, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, NA, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L, NA, 
    3L, 3L, NA, 3L, NA, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, NA, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("setosa", "versicolor", 
    "virginica"), class = "factor")), row.names = c(NA, -150L
), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM