[英]Extract year, month and day when dates are non-standard format
I have a column of dates, I want to extract the year, month and day into separate columns.我有一列日期,我想将年、月和日提取到单独的列中。 Unfortunately there are inconsistent entries in the dates column, so the normal solution of using
format(as.Date(),"%Y")
or lubridate::year()
doesn't work.不幸的是,日期列中的条目不一致,因此使用
format(as.Date(),"%Y")
或lubridate::year()
的正常解决方案不起作用。
Here is an example dataframe:这是一个示例 dataframe:
dates_df <- data.frame(dates = c("1985-03-23", "", "1983", "1984-01"))
And here is the desired result:这是期望的结果:
dates year month day
1 1985-03-23 1985 3 23
2 <NA> <NA> <NA>
3 1983 1983 <NA> <NA>
4 1984-01 1984 1 <NA>
I can achieve the desired result with the following code, but it is very slow on large datasets (>100,000 rows):我可以使用以下代码实现预期的结果,但在大型数据集(>100,000 行)上速度非常慢:
dates_df$year <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[1])
dates_df$month <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[2])
dates_df$day <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[3])
My question:我的问题:
Is there a more efficient (fast) way to extract year, month, day columns from messy date data?有没有更有效(快速)的方法从杂乱的日期数据中提取年、月、日列?
Using strsplit
and adapting the length
s.使用
strsplit
并调整length
s。
cbind(dates_df, t(sapply(strsplit(dates_df$dates, '-'), `length<-`, 3)))
# dates 1 2 3
# 1 1985-03-23 1985 03 23
# 2 <NA> <NA> <NA>
# 3 1983 1983 <NA> <NA>
# 4 1984-01 1984 01 <NA>
With nice names:有好听的名字:
cbind(dates_df, `colnames<-`(
t(sapply(strsplit(dates_df$dates, '-'), `length<-`, 3)), c('year', 'month', 'day')))
# dates year month day
# 1 1985-03-23 1985 03 23
# 2 <NA> <NA> <NA>
# 3 1983 1983 <NA> <NA>
# 4 1984-01 1984 01 <NA>
My first thought would have been to try tidyr::separate
.我的第一个想法是尝试
tidyr::separate
。 Untested for speed and might break down if there are date formats not represented in the example data.未经速度测试,如果示例数据中未显示日期格式,则可能会崩溃。
tidyr::separate(dates_df,
dates,
into = c('year', 'month', 'day'),
remove = FALSE)
#-----
dates year month day
1 1985-03-23 1985 03 23
2 <NA> <NA>
3 1983 1983 <NA> <NA>
4 1984-01 1984 01 <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.