简体   繁体   English

推算缺失值与之前13个值的平均值

[英]Impute missing values with average of previous 13 values

I have a dataset with few missing observations. 我有一个缺少观察值的数据集。 My objective is to impute the missing value in each variable with the average of previous 13 values. 我的目标是用之前的13个值的平均值来估算每个变量中的缺失值。 In case there is a missing value before the 13th observation, the average of whatever there before should be used for imputing that variable. 如果在第13个观测值之前存在缺失值,则应使用之前的平均值来估算该变量。 I am not sure how to do it. 我不确定该怎么做。

Please use the below to replicate my dataset. 请使用以下内容复制我的数据集。 Your help is much appreciated. 非常感谢您的帮助。

df1 <- structure(list(V1 = c(276.12, 53.4, 20.64, 181.8, 216.96, 10.44, 
69, 144.24, 10.32, 239.76, 79.32, 257.64, 28.56, 117, 244.92, 
234.48, NA, 337.68, 83.04, 176.76, 262.08, 284.88, 15.84, NA, 
74.76, 315.48, 171.48, 288.12, 298.56, 84.72, 351.48, 135.48, 
NA, 318.72, 114.84, 348.84, 320.28, 89.64, 51.72, 273.6, 243, 
212.4, 352.32, 248.28, NA, 210.12, 107.64, 287.88, 272.64, 80.28, 
239.76, 120.48, 259.68, 219.12, 315.24, 238.68, 8.76, 163.44, 
252.96), V2 = c(45.36, 47.16, 55.08, 49.56, 12.96, 58.68, 39.36, 
NA, 2.52, 3.12, 6.96, 28.8, NA, 9.12, 39.48, 57.24, 43.92, 47.52, 
24.6, 28.68, 33.24, 6.12, 19.08, 20.28, 15.12, 4.2, 35.16, NA, 
32.52, 19.2, 33.96, 20.88, 1.8, 24, 1.68, NA, 52.56, 59.28, 32.04, 
45.24, 26.76, 40.08, 33.24, 10.08, 30.84, 27, 11.88, 49.8, 18.96, 
14.04, 3.72, 11.52, 50.04, 55.44, 34.56, NA, 33.72, 23.04, 59.52
)), class = "data.frame", row.names = c(NA, -59L))

You can use zoo::rollapply to compute the mean over the 13 values: 您可以使用zoo::rollapply计算13个值的平均值:

mean13 = zoo::rollapply(
    df1$V1, 
    13, 
    function(x) { 
    mean(na.omit(x)) 
    }, 
    align = "right", 
    fill = NA, 
    partial = TRUE
)
df1$V1_prev_mean = c(df1$V1[1], head(mean13, -1))
df1$V1 = ifelse(is.na(df1$V1), df1$V1_prev_mean, df1$V1)

Output: 输出:

         V1    V2 V1_prev_mean
1  276.1200 45.36     276.1200
2   53.4000 47.16     276.1200
3   20.6400 55.08     164.7600
4  181.8000 49.56     116.7200
5  216.9600 12.96     132.9900
6   10.4400 58.68     149.7840
7   69.0000 39.36     126.5600
8  144.2400    NA     118.3371
9   10.3200  2.52     121.5750
10 239.7600  3.12     109.2133
11  79.3200  6.96     122.2680
12 257.6400 28.80     118.3636
13  28.5600    NA     129.9700
14 117.0000  9.12     122.1692
15 244.9200 39.48     109.9292
16 234.4800 57.24     124.6615
17 141.1108 43.92     141.1108  # <- this row filled
18 337.6800 47.52     137.7200
19  83.0400 24.60     147.7800
20 176.7600 28.68     153.8300

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM