简体   繁体   中英

Best way to treat an incorrect date in R

I need to turn a data frame df in the example code below to a date column with df$dates %<>% as.Date

df <- structure(list(dates = structure(c(19L, 18L, 17L, 16L, 14L, 13L, 
12L, 11L, 9L, 8L, 7L, 6L, 21L, 20L, 15L, 10L, 5L, 4L, 3L, 2L, 
1L), .Label = c("2014-12-31", "2015-06-30", "2015-12-31", "2016-06-30", 
"2016-12-31", "2017-01-31", "2017-03-31", "2017-06-30", "2017-09-31", 
"2017-12-31", "2018-01-31", "2018-03-31", "2018-06-30", "2018-09-31", 
"2018-12-31", "2019-01-31", "2019-03-31", "2019-06-30", "2019-09-31", 
"2019-12-31", "2020-06-30"), class = "factor")), class = "data.frame", row.names = c(NA, 
-21L))

However, there are incorrect dates In this date field, which leads to an error.

For example, 2019-09-31 is not a real date. "2019-09-31" %>% as.Date gives the error Error in charToDate(x): character string is not in a standard unambiguous format .

How can I best increment the date of all incorrect dates to the first date of next month since I can't turn them into date objects?

dplyr::coalesce returns the first non-NA, so if you know the reason some of your dates are not parsing is because they're one day beyond the end of the month, you could selectively replace those with the first day of the next month.

library(lubridate); library(dplyr)

okay_dates <- ymd(df$dates)
next_mo <- ymd(paste(substr(df$dates, 1, 7), "01")) %>% ceiling_date("month")
coalesce(okay_dates, next_mo)

 [1] "2019-10-01" "2019-06-30" "2019-03-31" "2019-01-31" "2018-10-01" "2018-06-30" "2018-03-31"
 [8] "2018-01-31" "2017-10-01" "2017-06-30" "2017-03-31" "2017-01-31" "2020-06-30" "2019-12-31"
[15] "2018-12-31" "2017-12-31" "2016-12-31" "2016-06-30" "2015-12-31" "2015-06-30" "2014-12-31"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM