简体   繁体   中英

R - Formatting dates in dataframe - mix of decimal and character values

I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date) .

The column contains date information in several formats and I am looking to get this into the one format.

#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019  16:00:00")

#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")

I have tried many work arounds including openxlsx::ConvertToDate , lubridate::parse_date_time , lubridate::date_decimal

openxlsx::ConvertToDate so far works best but it will only take 1 format and coerce NAs for the others

update

I realized I actually had one of the above output dates wrong. Value 43469.494444444441 should convert to 2019-04-01.

Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time .

temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")

temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"

You could use a helper function to normalize the dates which might be slightly faster than lubridate .

There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.

normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x))))  # Win excel
  as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
  as.Date(x, format="%m/%d/%Y %I:%M %p")
else
  as.Date(x, format="%m/%d/%Y %R")
})

For additional date formats just add another else if . Format specifications can be found with ?strptime .

Then just use as.Date() with usual origin.

res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019-01-04"         "2019-12-31"         "2019-12-01"
class(res)
# [1] "Date"

Edit: To achieve a specific output format, use format , eg

format(res, "%Y-%d-%m")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019-04-01"         "2019-31-12"         "2019-01-12" 

format(res, "%Y/%d/%m")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019/04/01"         "2019/31/12"         "2019/01/12" 

To lookup the codes type ?strptime .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM