简体   繁体   中英

Is there a way to clean date and time data in r?

I am trying to summarize time from 4 am to 12 pm as morning, 12-9 as evening and 9 pm to 4 am as night. I am doing this to make a logistic regression model to know if the arrest would happen or not considering the type of crime and the time of the crime.

I have tried using the lubridate function but because the format is the string I am not able to use the function. And, as.Date function is neither of help since some of the strings are having this value: 03/26/2015 06:56:30 PM while some of the rows have this value: 04-12-15 20:24 . Both the formatting are totally different so not able to use the as.Date function.

Apart from the as.Date function what we can do is convert all the 04-12-15 20:24 to 03/26/2015 06:56:30 PM format by doing something like => if you find - then replace it with / (for the date format).

I don't know how to achieve this goal.

附上部分数据的图片

You can use case_when() from the dplyr library to determine the format of the date and then proceed with the conversion based on the format type. From there we check the 24H time component to determine the time of day based on the bins in the OP.

library(dplyr)

chicago15 <- data.frame(Date = c("03/26/2015 06:56:30 PM","04-12-15 20:24",
                             "03/26/2015 06:56:30 AM","04-12-15 21:24",
                             "12/31/2017 03:28:43 AM"))

chicago15 %>% 
  dplyr::mutate(Date2 = dplyr::case_when(
    grepl('-',Date) ~ as.POSIXct(Date,format = '%m-%d-%y %H:%M'),
    TRUE ~ as.POSIXct(Date,format = '%m/%d/%Y %I:%M:%S %p')
  )) %>%

  dplyr::mutate(Time_of_Day = dplyr::case_when(
    as.numeric(format(Date2,'%H')) >= 21 ~ 'night',
    as.numeric(format(Date2,'%H')) >= 12 ~ 'evening',
    as.numeric(format(Date2,'%H')) >= 4 ~ 'morning',
    TRUE ~ 'night'
  ))

  Date                   Date2               Time_of_Day
1 03/26/2015 06:56:30 PM 2015-03-26 18:56:30     evening
2         04-12-15 20:24 2015-04-12 20:24:00     evening
3 03/26/2015 06:56:30 AM 2015-03-26 06:56:30     morning
4         04-12-15 21:24 2015-04-12 21:24:00       night
5 12/31/2017 03:28:43 AM 2017-12-31 03:28:43       night

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM