简体   繁体   English

有没有办法清理 r 中的日期和时间数据?

[英]Is there a way to clean date and time data in r?

I am trying to summarize time from 4 am to 12 pm as morning, 12-9 as evening and 9 pm to 4 am as night.我试图总结从凌晨 4 点到下午 12 点的时间为早上,12-9 点为晚上,晚上 9 点到凌晨 4 点为晚上。 I am doing this to make a logistic regression model to know if the arrest would happen or not considering the type of crime and the time of the crime.我这样做是为了制作一个逻辑回归模型,以了解是否会发生逮捕,而不会考虑犯罪类型和犯罪时间。

I have tried using the lubridate function but because the format is the string I am not able to use the function.我曾尝试使用 lubridate 函数,但由于格式是字符串,我无法使用该函数。 And, as.Date function is neither of help since some of the strings are having this value: 03/26/2015 06:56:30 PM while some of the rows have this value: 04-12-15 20:24 .而且, as.Date函数也无济于事,因为某些字符串具有此值: 03/26/2015 06:56:30 PM而某些行具有此值: 04-12-15 20:24 Both the formatting are totally different so not able to use the as.Date function.两种格式完全不同,因此无法使用as.Date函数。

Apart from the as.Date function what we can do is convert all the 04-12-15 20:24 to 03/26/2015 06:56:30 PM format by doing something like => if you find - then replace it with / (for the date format).除了as.Date函数,我们可以做的是将所有04-12-15 20:24转换为04-12-15 20:24 03/26/2015 06:56:30 PM格式,方法是 => 如果你找到-然后用/ (用于日期格式)。

I don't know how to achieve this goal.我不知道如何实现这个目标。

附上部分数据的图片

You can use case_when() from the dplyr library to determine the format of the date and then proceed with the conversion based on the format type.您可以使用case_when()dplyr库来确定日期的格式,然后基于该格式类型的转换进行。 From there we check the 24H time component to determine the time of day based on the bins in the OP.从那里我们检查 24H 时间组件以确定基于 OP 中的 bin 的时间。

library(dplyr)

chicago15 <- data.frame(Date = c("03/26/2015 06:56:30 PM","04-12-15 20:24",
                             "03/26/2015 06:56:30 AM","04-12-15 21:24",
                             "12/31/2017 03:28:43 AM"))

chicago15 %>% 
  dplyr::mutate(Date2 = dplyr::case_when(
    grepl('-',Date) ~ as.POSIXct(Date,format = '%m-%d-%y %H:%M'),
    TRUE ~ as.POSIXct(Date,format = '%m/%d/%Y %I:%M:%S %p')
  )) %>%

  dplyr::mutate(Time_of_Day = dplyr::case_when(
    as.numeric(format(Date2,'%H')) >= 21 ~ 'night',
    as.numeric(format(Date2,'%H')) >= 12 ~ 'evening',
    as.numeric(format(Date2,'%H')) >= 4 ~ 'morning',
    TRUE ~ 'night'
  ))

  Date                   Date2               Time_of_Day
1 03/26/2015 06:56:30 PM 2015-03-26 18:56:30     evening
2         04-12-15 20:24 2015-04-12 20:24:00     evening
3 03/26/2015 06:56:30 AM 2015-03-26 06:56:30     morning
4         04-12-15 21:24 2015-04-12 21:24:00       night
5 12/31/2017 03:28:43 AM 2017-12-31 03:28:43       night

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM