简体   繁体   中英

Conditionally mutate column across list of dataframes in R

I am working with a large list of dataframes that use inconsistent date formats. I would like to conditionally mutate across the list so that any dataframe that contains a string will use one date format, and those that do not contain the string use another format. In other words, I want to distinguish between dataframes launched in year 2019 (which use mdy) and those launched in all others years (which use dmy).

The following code will conditionally mutate rows within a dataframe, but I am unsure how to conditionally mutate across the entire column.

dataframes %>% map(~.x %>% 
    mutate(date_time = if_else(str_detect(date_time, "/19 "), 
                               mdy_hms(date_time), dmy_hms(date_time)))

Thank you!

edit

Data and code example. There are dataframes that contain a mixture of years.

library(tidyverse)
library(lubridate)

dataframes <- list(
  tibble(date_time = c("07/06/19 01:00:00 PM", "07/06/20 01:00:00 PM"), num = 1:2), # July 6th
  tibble(date_time = c("06/07/20 01:00:00 PM", "06/07/21 01:00:00 PM"), num = 1:2)  # July 6th 
)

dataframes %>% 
  map(~.x %>% 
        mutate(date_time = if_else(str_detect(date_time, "/19 "), 
                                   mdy_hms(date_time), dmy_hms(date_time)),
               date = date(date_time),
               month = month(date_time),
               doy = yday(date_time)))
                   

[[1]]
# A tibble: 2 × 5
  date_time             num date       month   doy
  <dttm>              <int> <date>     <dbl> <dbl>
1 2019-07-06 13:00:00     1 2019-07-06     7   187
2 2020-06-07 13:00:00     2 2020-06-07     6   159

[[2]]
# A tibble: 2 × 5
  date_time             num date       month   doy
  <dttm>              <int> <date>     <dbl> <dbl>
1 2020-07-06 13:00:00     1 2020-07-06     7   188
2 2021-07-06 13:00:00     2 2021-07-06     7   187

Without seeing the details of your data, we can't test your code. However, it looks like one issue is the pattern argument of str_detect() .

You are providing "/19 " but I'm assuming it's not actually finding anything, which means that everything is getting interpreted as dmy . I think you'll need to escape the / character and probably remove the trailing space depending on how your data are formatted. That would leave you with "\/19" . Alternatively, if the YY of the date is always the end of the string you could use "19$" .

library(tidyverse)
library(lubridate)

l <- list(
  tibble(date_time = c("04/16/19", "04/17/19"), num = 1:2),
  tibble(date_time = c("16/04/20", "17/04/20"), num = 1:2)
)

f <- function(x) {
  mutate(x,
         date_time = if_else(
           str_detect(date_time, "19$"),
           mdy(date_time),
           dmy(date_time)
         ))
}

l %>% map(f)
#> [[1]]
#> # A tibble: 2 × 2
#>   date_time    num
#>   <date>     <int>
#> 1 2019-04-16     1
#> 2 2019-04-17     2
#> 
#> [[2]]
#> # A tibble: 2 × 2
#>   date_time    num
#>   <date>     <int>
#> 1 2020-04-16     1
#> 2 2020-04-17     2

Created on 2022-07-19 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM