简体   繁体   中英

identifying the most recent datetime

I'm having some trouble with logic I need to produce df$val_most_recent . If there's a value for both a_val and b_val , val_most_recent should be the value with the most recent time ( a_val corresponds to a_dtm , b_val corresponds to b_dtm ). If the times are identical, I'd like a_val to be val_most_recent . If just one value is reported for the two (with the other being a NA , it should simply be that one.

library(tidyverse)
library(lubridate)

location <- c("a", "b", "c", "d")
a_dtm <- ymd_hm(c(NA, "2019-06-05 10:30", "2019-06-05 10:45", "2019-06-05 10:50"))
b_dtm <- ymd_hm(c("2019-06-05 10:30", NA,  "2019-06-05 10:48", "2019-06-05 10:50"))
a_val <- c(NA, 6, 4, 2)
b_val <- c(5, NA, 3, 2)

df <- data.frame(location, a_dtm, b_dtm, a_val, b_val)

as_tibble(df)
# A tibble: 4 x 5
#location a_dtm               b_dtm               a_val b_val
#<fct>    <dttm>              <dttm>              <dbl> <dbl>
#1 a        NA                  2019-06-05 10:30:00    NA     5
#2 b        2019-06-05 10:30:00 NA                      6    NA
#3 c        2019-06-05 10:45:00 2019-06-05 10:48:00     4     3
#4 d        2019-06-05 10:50:00 2019-06-05 10:50:00     2     2

val_most_recent <- c(5,6,3,2)
desired_df <- cbind(df, val_most_recent)
as_tibble(desired_df)

#location a_dtm               b_dtm                  a_val    b_val val_most_recent
#<fct>    <dttm>              <dttm>                 <dbl>   <dbl>      <dbl>
#1 a        NA                  2019-06-05 10:30:00    NA     5           5
#2 b        2019-06-05 10:30:00 NA                      6    NA           6
#3 c        2019-06-05 10:45:00 2019-06-05 10:48:00     4     3           3
#4 d        2019-06-05 10:50:00 2019-06-05 10:50:00     2     2           2

Here is one option in base R , convert the dates to numeric, replace the NAs with 0, get the column index with the max values in each row, cbind with the row index and extract the corresponding values from 'a_val/b_val' column

m1 <- sapply(df[2:3], as.numeric)
df$val_most_recent <- df[4:5][cbind(seq_len(nrow(m1)), 
         max.col(replace(m1, is.na(m1), 0), "first"))]
df$val_most_recent
#[1] 5 6 3 2

Here is the logic from your text coded into a case_when statement:

df %>%
  mutate(
    val_most_recent = case_when(
      is.na(a_val) | is.na(b_va) ~ coalesce(a_val, b_val),
      a_dtm >= b_dtm ~ a_val,
      TRUE ~ b_val
    )
  )
#   location               a_dtm               b_dtm a_val b_val val_most_recent
# 1        a                <NA> 2019-06-05 10:30:00    NA     5               5
# 2        b 2019-06-05 10:30:00                <NA>     6    NA               6
# 3        c 2019-06-05 10:45:00 2019-06-05 10:48:00     4     3               3
# 4        d 2019-06-05 10:50:00 2019-06-05 10:50:00     2     2               2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM