简体   繁体   中英

Fill NAs with next columns for moving average

df <- data.frame(loc.id = rep(c(1:3), each = 4*10), 
                       year = rep(rep(c(1980:1983), each = 10), times = 3),
                       day = rep(1:10, times = 3*4),
                       x = sample(123:200, 4*3*10, replace = T))

I want to add one more column x.mv which is 3 days moving average of x for each loc.id and year combination

df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = "NA", align = "right"))

          loc.id  year   day     x  x.mv
          <int> <int> <int> <int> <dbl>
      1      1   1980     1   145  NA 
      2      1   1980     2   184  NA 
      3      1   1980     3   154  161 
      4      1   1980     4   191  176.
      5      1   1980     5   196  180.
      6      1   1980     6   126  171 
      7      1   1980     7   164  162 
      8      1   1980     8   192  161.
      9      1   1980     9   166  174 
      10      1  1980    10   158  172 

What I want to do is to replace the NAs in the x.mv column with x . I tried this:

df %>% group_by(loc.id,year) %>% mutate(x.mv = zoo::rollmean(x, 3, fill = x[1:2], align = "right"))

            loc.id  year   day     x  x.mv
            <int> <int> <int> <int> <dbl>
        1      1   1980     1   145  145 
        2      1   1980     2   184  145 
        3      1   1980     3   154  161 
        4      1   1980     4   191  176.
        5      1   1980     5   196  180.
        6      1   1980     6   126  171 
        7      1   1980     7   164  162 
        8      1   1980     8   192  161.
        9      1   1980     9   166  174 
        10     1  1980     10   158  172 

But what it is doing instead is filling the NAs with the first value of x instead of the corresponding value of x. How do I fix it?

skip the fill argument and pad manually:

df %>%
  group_by(loc.id,year) %>%
  mutate(x.mv = c(x[1:2],zoo::rollmean(x, 3, align = "right"))) %>%

# # A tibble: 120 x 5
#   loc.id  year   day     x     x.mv
#    <int> <int> <int> <int>    <dbl>
# 1      1  1980     1   145 145.0000
# 2      1  1980     2   184 184.0000
# 3      1  1980     3   154 161.0000
# 4      1  1980     4   191 176.3333
# 5      1  1980     5   196 180.3333
# 6      1  1980     6   126 171.0000
# 7      1  1980     7   164 162.0000
# 8      1  1980     8   192 160.6667
# 9      1  1980     9   166 174.0000
# 10     1  1980    10   158 172.0000
# # ... with 110 more rows

You might want to use dplyr::cummean(x[1:2]) instead of x[1:2] , to have an average for the second value already, or in this case, use @g-grothendieck's suggestion in the comments and rewrite your mutate call as mutate(x.mv = rollapplyr(x, 3, mean, partial = TRUE)) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM