简体   繁体   English

日期为非标准格式时提取年月日

[英]Extract year, month and day when dates are non-standard format

I have a column of dates, I want to extract the year, month and day into separate columns.我有一列日期,我想将年、月和日提取到单独的列中。 Unfortunately there are inconsistent entries in the dates column, so the normal solution of using format(as.Date(),"%Y") or lubridate::year() doesn't work.不幸的是,日期列中的条目不一致,因此使用format(as.Date(),"%Y")lubridate::year()的正常解决方案不起作用。

Here is an example dataframe:这是一个示例 dataframe:

dates_df <- data.frame(dates = c("1985-03-23", "", "1983", "1984-01"))

And here is the desired result:这是期望的结果:

       dates year month  day
1 1985-03-23 1985     3   23
2            <NA>  <NA> <NA>
3       1983 1983  <NA> <NA>
4    1984-01 1984     1 <NA>

I can achieve the desired result with the following code, but it is very slow on large datasets (>100,000 rows):我可以使用以下代码实现预期的结果,但在大型数据集(>100,000 行)上速度非常慢:

dates_df$year <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[1])
dates_df$month <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[2])
dates_df$day <- sapply(dates_df$dates, function(x) unlist(strsplit(x, "\\-"))[3])

My question:我的问题:

Is there a more efficient (fast) way to extract year, month, day columns from messy date data?有没有更有效(快速)的方法从杂乱的日期数据中提取年、月、日列?

Using strsplit and adapting the length s.使用strsplit并调整length s。

cbind(dates_df, t(sapply(strsplit(dates_df$dates, '-'), `length<-`, 3)))
#        dates    1    2    3
# 1 1985-03-23 1985   03   23
# 2            <NA> <NA> <NA>
# 3       1983 1983 <NA> <NA>
# 4    1984-01 1984   01 <NA>

With nice names:有好听的名字:

cbind(dates_df, `colnames<-`(
  t(sapply(strsplit(dates_df$dates, '-'), `length<-`, 3)), c('year', 'month', 'day')))
#        dates year month  day
# 1 1985-03-23 1985    03   23
# 2            <NA>  <NA> <NA>
# 3       1983 1983  <NA> <NA>
# 4    1984-01 1984    01 <NA>

My first thought would have been to try tidyr::separate .我的第一个想法是尝试tidyr::separate Untested for speed and might break down if there are date formats not represented in the example data.未经速度测试,如果示例数据中未显示日期格式,则可能会崩溃。

tidyr::separate(dates_df, 
                dates, 
                into = c('year', 'month', 'day'), 
                remove = FALSE)

#-----
       dates year month  day
1 1985-03-23 1985    03   23
2                  <NA> <NA>
3       1983 1983  <NA> <NA>
4    1984-01 1984    01 <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM