简体   繁体   English

如果在重复的相同非 NA 值之间,则用最后一个非 NA 填充 NA 值

[英]Filling NA values with last non-NA's if between repeated identical non-NA values

I would like to replace the NA's values in my dataset with the previous non-NA value but only if the NA's are between identical values.我想用以前的非 NA 值替换我数据集中的 NA 值,但前提是 NA 在相同值之间。

To illustrate here's a small sample of the data:为了说明这里是数据的一个小样本:

      date        1     2     3
1  2004-12-27     NA    NA    NA
2  2004-12-28  2.299 2.349 2.348
3  2004-12-29     NA    NA    NA
4  2005-01-03     NA    NA    NA
5  2005-01-04     NA    NA    NA
6  2005-01-05  2.299    NA    NA
7  2005-01-06     NA    NA    NA
8  2005-01-10     NA    NA    NA
9  2005-01-11  2.299 2.349 2.348
10 2005-01-12     NA    NA    NA
11 2005-01-17     NA    NA    NA
12 2005-01-18  2.299    NA    NA
13 2005-01-19     NA    NA    NA
14 2005-01-24     NA    NA    NA
15 2005-01-25     NA 2.369 2.368
16 2005-01-26  2.299    NA    NA
17 2005-01-31  2.299    NA    NA
18 2005-02-01     NA    NA    NA
19 2005-02-02     NA    NA    NA
20 2005-02-08     NA    NA    NA

The ideal output would be:理想的 output 是:

     date         1     2     3
1  2004-12-27     NA    NA    NA
2  2004-12-28  2.299 2.349 2.348
3  2004-12-29  2.299 2.349 2.348
4  2005-01-03  2.299 2.349 2.348
5  2005-01-04  2.299 2.349 2.348
6  2005-01-05  2.299 2.349 2.348
7  2005-01-06  2.299 2.349 2.348
8  2005-01-10  2.299 2.349 2.348
9  2005-01-11  2.299 2.349 2.348
10 2005-01-12  2.299    NA    NA
11 2005-01-17  2.299    NA    NA
12 2005-01-18  2.299    NA    NA
13 2005-01-19  2.299    NA    NA
14 2005-01-24  2.299    NA    NA
15 2005-01-25  2.299 2.369 2.368
16 2005-01-26  2.299    NA    NA
17 2005-01-31  2.299    NA    NA

Here's a reproducible sample of the dataset using dput :这是使用dput的数据集的可重现样本:

structure(list(data_gas = structure(c(12779, 12780, 12781, 12786, 
12787, 12788, 12789, 12793, 12794, 12795, 12800, 12801, 12802, 
12807, 12808, 12809, 12814, 12815, 12816, 12822), class = "Date"), 
    `1` = c(NA, 2.299, NA, NA, NA, 2.299, NA, NA, 2.299, NA, 
    NA, 2.299, NA, NA, NA, 2.299, 2.299, NA, NA, NA), `3` = c(NA, 
    2.349, NA, NA, NA, NA, NA, NA, 2.349, NA, NA, NA, NA, NA, 
    2.369, NA, NA, NA, NA, NA), `4` = c(NA, 2.348, NA, NA, NA, 
    NA, NA, NA, 2.348, NA, NA, NA, NA, NA, 2.368, NA, NA, NA, 
    NA, NA)), row.names = c(NA, 20L), class = "data.frame")

I've tried a few for loops without sucess.我已经尝试了几个for循环但没有成功。

Any help will be greatly appreciated.任何帮助将不胜感激。

Here is a base R for loop solution.for循环解决方案的基础 R。

Write a function that compares two consecutive non- NA values and if they are the same fill the middle NA values with the same value.编写一个 function 来比较两个连续的非NA值,如果它们相同,则用相同的值填充中间的NA值。

fill_NA_values <- function(x) {
  #Index of non-NA values
  non_na_values <- which(!is.na(x))
  #loop over each index.
  for(i in seq_along(non_na_values[-1])) {
    #If two consecutive non-NA value are the same
    if(x[non_na_values[i]] == x[non_na_values[i + 1]]) {
      #Fill the NA values in between with the value.
      x[(non_na_values[i] + 1):(non_na_values[i+1] -1)] <- x[non_na_values[i]]
    }
  }
  x
}

Apply this for multiple columns using lapply .使用lapply将此应用于多列。

df[-1] <- lapply(df[-1], fill_NA_values)
df

#         date    X1    X3    X4
#1  2004-12-27    NA    NA    NA
#2  2004-12-28 2.299 2.349 2.348
#3  2004-12-29 2.299 2.349 2.348
#4  2005-01-03 2.299 2.349 2.348
#5  2005-01-04 2.299 2.349 2.348
#6  2005-01-05 2.299 2.349 2.348
#7  2005-01-06 2.299 2.349 2.348
#8  2005-01-10 2.299 2.349 2.348
#9  2005-01-11 2.299 2.349 2.348
#10 2005-01-12 2.299    NA    NA
#11 2005-01-17 2.299    NA    NA
#12 2005-01-18 2.299    NA    NA
#13 2005-01-19 2.299    NA    NA
#14 2005-01-24 2.299    NA    NA
#15 2005-01-25 2.299 2.369 2.368
#16 2005-01-26 2.299    NA    NA
#17 2005-01-31 2.299    NA    NA
#18 2005-02-01    NA    NA    NA
#19 2005-02-02    NA    NA    NA
#20 2005-02-08    NA    NA    NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM