简体   繁体   English

使用MICE包插入后创建新变量

[英]Creating new variables after imputation with the MICE package

I have longitudinal panel data of 1000 individuals measured at two time points. 我有两个时间点测量的1000个人的纵向面板数据。 Using the MICE package I have imputed values for those variables with missing data. 使用MICE包我已经为那些缺少数据的变量估算了值。 The imputation itself works fine, generating the required 17 imputed data frames. 插补本身工作正常,生成所需的17个插补数据帧。 One of the imputed variables is fitness . 其中一个推算变量就是fitness I would like to create a new variable of fitness scaled, scale(fitness) . 我想创建一个健身缩放, scale(fitness)的新变量。 My understanding is that I should impute first, and then create the new variable with the imputed data. 我的理解是我应首先归咎于,然后使用插补数据创建新变量。 How do I access each of the 17 imputed datasets and generate a scaled fitness variable in each? 如何访问17个插补数据集中的每一个并在每个数据集中生成缩放的适应度变量?

My original data frame looks like (some variables missing): 我的原始数据框看起来像(缺少一些变量):

      id   age school   sex      andersen ldl_c_trad  pre_post
   <dbl> <dbl>  <fct>  <fct>        <int>      <dbl>     <fct>
 1     2  10.7      1      1          951       2.31         1
 2     2  11.3      1      1          877       2.20         2
 3     3  11.3      1      1          736       2.88         1
 4     3  11.9      1      1          668       3.36         2
 5     4  10.1      1      0          872       3.31         1
 6     4  10.7      1      0          905       2.95         2
 7     5  10.5      1      1          925       2.02         1
 8     5  11.0      1      1          860       1.92         2
 9     8  10.7      1      1          767       3.41         1
10     8  11.2      1      1          709       3.32         2

My imputation code is: 我的归责代码是:

imputed <- mice(imp_vars, method = meth, predictorMatrix = predM, m = 17)

imp_vars are the variables selected for imputation. imp_vars是为插补选择的变量。 I have pre-specified both the method and predictor matrix. 我已经预先指定了方法和预测矩阵。

Also, my assumption is that the scaling should be performed separately for each time point, as fitness is likely to have improved over time. 此外,我的假设是缩放应该针对每个时间点单独执行,因为适应性可能随着时间的推移而改善。 Is it possible to perform the scaling filtered by pre_post for each imputed dataset? 是否可以针对每个插补数据集执行pre_post过滤的缩放?

Many thanks. 非常感谢。

To access each of the imputations where x is a value from 1-17 访问每个插补,其中x是1-17的值

data <- complete(imputed, x)

or if you want access to the fitness variable 或者如果你想访问健身变量

complete(imputed, x)$fitness

If you want to filter observations according to a value of another variable in the dataframe, you could use 如果要根据数据框中另一个变量的值过滤观察结果,可以使用

data[which(data$pre_post==1), "fitness"]

This should return the fitness observations for when pre_post==1, from there it is simply a matter of scaling these observations for each level of pre_post, assigning them to another variable fitness_scaled and then repeating for each imputation 1-17. 这应该返回当pre_post == 1时的适应度观测值,从那里仅仅是为每个pre_post级别缩放这些观察值,将它们分配给另一个变量fitness_scaled ,然后对每个插补1-17重复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM