[英]Is there a way to clean date and time data in r?
I am trying to summarize time from 4 am to 12 pm as morning, 12-9 as evening and 9 pm to 4 am as night.我试图总结从凌晨 4 点到下午 12 点的时间为早上,12-9 点为晚上,晚上 9 点到凌晨 4 点为晚上。 I am doing this to make a logistic regression model to know if the arrest would happen or not considering the type of crime and the time of the crime.
我这样做是为了制作一个逻辑回归模型,以了解是否会发生逮捕,而不会考虑犯罪类型和犯罪时间。
I have tried using the lubridate function but because the format is the string I am not able to use the function.我曾尝试使用 lubridate 函数,但由于格式是字符串,我无法使用该函数。 And,
as.Date
function is neither of help since some of the strings are having this value: 03/26/2015 06:56:30 PM
while some of the rows have this value: 04-12-15 20:24
.而且,
as.Date
函数也无济于事,因为某些字符串具有此值: 03/26/2015 06:56:30 PM
而某些行具有此值: 04-12-15 20:24
。 Both the formatting are totally different so not able to use the as.Date
function.两种格式完全不同,因此无法使用
as.Date
函数。
Apart from the as.Date
function what we can do is convert all the 04-12-15 20:24
to 03/26/2015 06:56:30 PM
format by doing something like => if you find -
then replace it with /
(for the date format).除了
as.Date
函数,我们可以做的是将所有04-12-15 20:24
转换为04-12-15 20:24
03/26/2015 06:56:30 PM
格式,方法是 => 如果你找到-
然后用/
(用于日期格式)。
I don't know how to achieve this goal.我不知道如何实现这个目标。
You can use case_when()
from the dplyr
library to determine the format of the date and then proceed with the conversion based on the format type.您可以使用
case_when()
从dplyr
库来确定日期的格式,然后基于该格式类型的转换进行。 From there we check the 24H time component to determine the time of day based on the bins in the OP.从那里我们检查 24H 时间组件以确定基于 OP 中的 bin 的时间。
library(dplyr)
chicago15 <- data.frame(Date = c("03/26/2015 06:56:30 PM","04-12-15 20:24",
"03/26/2015 06:56:30 AM","04-12-15 21:24",
"12/31/2017 03:28:43 AM"))
chicago15 %>%
dplyr::mutate(Date2 = dplyr::case_when(
grepl('-',Date) ~ as.POSIXct(Date,format = '%m-%d-%y %H:%M'),
TRUE ~ as.POSIXct(Date,format = '%m/%d/%Y %I:%M:%S %p')
)) %>%
dplyr::mutate(Time_of_Day = dplyr::case_when(
as.numeric(format(Date2,'%H')) >= 21 ~ 'night',
as.numeric(format(Date2,'%H')) >= 12 ~ 'evening',
as.numeric(format(Date2,'%H')) >= 4 ~ 'morning',
TRUE ~ 'night'
))
Date Date2 Time_of_Day
1 03/26/2015 06:56:30 PM 2015-03-26 18:56:30 evening
2 04-12-15 20:24 2015-04-12 20:24:00 evening
3 03/26/2015 06:56:30 AM 2015-03-26 06:56:30 morning
4 04-12-15 21:24 2015-04-12 21:24:00 night
5 12/31/2017 03:28:43 AM 2017-12-31 03:28:43 night
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.