简体   繁体   中英

R Days from Last non-NA Grouped by Column

I have a dataframe that looks like this:

df_raw <- structure(list(date = structure(c(17075, 17076, 17077, 17108, 
17109, 17110, 17111, 17112, 17113, 17221, 17222, 17223, 17224, 
17225, 17226, 17227, 17228, 17229, 17230, 17231, 17232, 17286, 
17075, 17076, 17077, 17078, 17079, 17080, 17081, 17082, 17083, 
17084, 17085, 17086, 17087, 17088, 17089, 17090, 17091), class = "Date"), 
    Req_BU = c("12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "12018", "12018", "12018", "12018", "12018", 
    "12018", "12018", "14004", "14004", "14004", "14004", "14004", 
    "14004", "14004", "14004", "14004", "14004", "14004", "14004", 
    "14004", "14004", "14004", "14004", "14004"), last_rec_date = c(1L, 
    1L, 1L, 1L, 1L, NA, NA, 3L, 1L, 1L, 1L, NA, 2L, 1L, 1L, 1L, 
    1L, 1L, NA, NA, 3L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, NA, 
    3L, 1L, 1L, 1L, 1L, NA, 2L, 1L)), .Names = c("date", "Req_BU", 
"last_rec_date"), row.names = c(NA, -39L), class = "data.frame")


> head(df_raw, 10)
         date Req_BU last_rec_date
1  2016-10-01  12018             1
2  2016-10-02  12018             1
3  2016-10-03  12018             1
4  2016-11-03  12018             1
5  2016-11-04  12018             1
6  2016-11-05  12018            NA
7  2016-11-06  12018            NA
8  2016-11-07  12018             3
9  2016-11-08  12018             1
10 2017-02-24  12018             1

> df_raw[22:30, ]
         date Req_BU last_rec_date
22 2017-04-30  12018             1
23 2016-10-01  14004            NA
24 2016-10-02  14004            NA
25 2016-10-03  14004             1
26 2016-10-04  14004             1
27 2016-10-05  14004             1
28 2016-10-06  14004             1
29 2016-10-07  14004             1
30 2016-10-08  14004            NA

What I need to do is replace the NA values in the last_rec_date column with the number of days since the last non- NA . This all needs to be done based on a grouping variable called Req_BU . My data start on 10-01-2016 and if a particular Req_BU starts with an NA for that date, I need to fill that with a 1 and keep doing that until there is a non- NA value at which point the normal logic takes over.

I am looking for something like this.

> head(df_hope, 10)
         date Req_BU last_rec_date
1  2016-10-01  12018             1
2  2016-10-02  12018             1
3  2016-10-03  12018             1
4  2016-11-03  12018             1
5  2016-11-04  12018             1
6  2016-11-05  12018             1
7  2016-11-06  12018             2
8  2016-11-07  12018             3
9  2016-11-08  12018             1
10 2017-02-24  12018             1

> df_hope[22:30, ]
         date Req_BU last_rec_date
22 2017-04-30  12018             1
23 2016-10-01  14004             1
24 2016-10-02  14004             1
25 2016-10-03  14004             1
26 2016-10-04  14004             1
27 2016-10-05  14004             1
28 2016-10-06  14004             1
29 2016-10-07  14004             1
30 2016-10-08  14004             1

I tried this, but it isn't even handling the first part of the logic I need.

library(dplyr)
df_not_working <- df_raw %>%
  group_by(Req_BU) %>%
  mutate(last_rec_date = ifelse(is.na(last_rec_date), 
                                c(NA, diff(date)), 
                                  last_rec_date))

> df_not_working
Source: local data frame [39 x 3]
Groups: Req_BU [2]

# A tibble: 39 x 3
         date Req_BU last_rec_date
       <date>  <chr>         <dbl>
 1 2016-10-01  12018             1
 2 2016-10-02  12018             1
 3 2016-10-03  12018             1
 4 2016-11-03  12018             1
 5 2016-11-04  12018             1
 6 2016-11-05  12018             1
 7 2016-11-06  12018             1
 8 2016-11-07  12018             3
 9 2016-11-08  12018             1
10 2017-02-24  12018             1

The rest of the analysis is pretty dplyr heavy so I am OK using that or a base solution if one exists. Thank you.

Maybe this will work? Not very R-ish so maybe someone has a better approach.

fill_na <- function(df, colname){
  x<- 1
  col <- as.character(colname)
  dfcol <- df[as.character(colname)]
  for(i in 1:nrow(dfcol)){
    ifelse(is.na(dfcol[i,col]), {
      df[i,col] = x
      x <- x + 1
    },
    x <- 1)
  }
  return(df)
}

df_hope <- unsplit(lapply(split(df_raw, f = df_raw$Req_BU), fill_na, colname = "last_rec_date"), f = df_raw$Req_BU)

Edit: Made a clearer example to test approach:

example_df <- structure(list(date = structure(c(17075, 17076, 17077, 17108, 
17109, 17083, 17084, 17085, 17086, 17087), class = "Date"), Req_BU = c("12018", 
"12018", "12018", "12018", "12018", "14004", "14004", "14004", 
"14004", "14004"), last_rec_date = c(1L, 1L, 1L, NA, NA, NA, 
NA, NA, 1L, 1L)), .Names = c("date", "Req_BU", "last_rec_date"
), row.names = c(1L, 2L, 3L, 4L, 5L, 31L, 32L, 33L, 34L, 35L), class = "data.frame")

> example_df
         date Req_BU last_rec_date
1  2016-10-01  12018             1
2  2016-10-02  12018             1
3  2016-10-03  12018             1
4  2016-11-03  12018            NA
5  2016-11-04  12018            NA
31 2016-10-09  14004            NA
32 2016-10-10  14004            NA
33 2016-10-11  14004            NA
34 2016-10-12  14004             1
35 2016-10-13  14004             1

Starting with a dataframe where NA values cross the "border" between "Req_BU" 12018 and 14004, split that dataframe by "Req_BU" values into a list of independent dataframes. Then, use lapply to apply the function above to each individual dataframe before using unsplit to return to a single dataframe.

df_ex <- unsplit(lapply(split(example_df, f = example_df$Req_BU), fill_na, colname = "last_rec_date"), f = example_df$Req_BU)

> df_ex
         date Req_BU last_rec_date
1  2016-10-01  12018             1
2  2016-10-02  12018             1
3  2016-10-03  12018             1
4  2016-11-03  12018             1
5  2016-11-04  12018             2
31 2016-10-09  14004             1
32 2016-10-10  14004             2
33 2016-10-11  14004             3
34 2016-10-12  14004             1
35 2016-10-13  14004             1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM