序数日期的格式（月份中的日期后缀 -st、-nd、-rd、-th）

Question

Am I missing something?我错过了什么吗？ I can't figure out how to convert the following to Date s, where day of the month ( %d ) has the ordinal suffixes -st , -nd , -rd , -th :我无法弄清楚如何将以下转换Date s，其中当月（当天%d ）具有序后缀-st ， -nd ， -rd ， -th ：

ord_dates <- c("September 1st, 2016", "September 2nd, 2016",
               "September 3rd, 2016", "September 4th, 2016")

?strptime doesn't appear to list a shorthand for the ordinal suffix, and it isn't handled automagically: ?strptime似乎没有列出序数后缀的简写，也不会自动处理：

as.Date(ord_dates, format = c("%B %d, %Y"))
#[1] NA NA NA NA

Is there a token for handling ignored characters in the format argument?是否有用于处理format参数中被忽略字符的标记？ A token I'm missing?我丢失的令牌？

Best I can come up with is (there may a shorter regex, but same idea):我能想到的最好的是（可能有一个更短的正则表达式，但同样的想法）：

as.Date(gsub("([0-9]+)(st|nd|rd|th)", "\\1", ord_dates), format = "%B %d, %Y")
# [1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

Seems like this sort of data should be relatively common;看起来这种数据应该比较普遍； am I missing something?我错过了什么吗？

Answer 1

Enjoy the power of lubridate :享受lubridate的力量：

library(lubridate)    
mdy(ord_dates)

[1] "2016-09-01" "2016-09-02" "2016-09-03" "2016-09-04"

Internally, lubridate doesn't have any special conversion specifications which enable this.在内部， lubridate没有任何特殊的转换规范来实现这一点。 Rather, lubridate first uses (by smart guessing) the format "%B %dst, %Y" .相反， lubridate首先使用（通过智能猜测）格式"%B %dst, %Y" 。 This gets the first element of ord_dates .这将获取ord_dates的第一个元素。

It then checks for NA s and repeats its smart guessing on the remaining elements, settling on "%B %dnd, %Y" to get the second element.然后检查NA并对剩余元素重复其智能猜测，确定"%B %dnd, %Y"以获取第二个元素。 It continues in this way until there are no NA s left (which happens in this case after 4 iterations), or until its smart guessing fails to turn up a likely format candidate.它以这种方式继续直到没有NA剩余（在这种情况下发生在 4 次迭代之后），或者直到它的智能猜测未能找到可能的格式候选。

You can imagine this makes lubridate slower, and it does -- about half the speed of just using the smart regex suggested by @alistaire above:您可以想象这会使lubridate变慢，而且确实如此 - 仅使用上面@alistaire 建议的智能正则表达式的速度大约是其一半：

set.seed(109123)
ord_dates <- sample(
  c("September 1st, 2016", "September 2nd, 2016",
    "September 3rd, 2016", "September 4th, 2016"),
  1e6, TRUE
  )

library(microbenchmark)

microbenchmark(times = 10L,
               lubridate = mdy(ord_dates),
               base = as.Date(sub("\\D+,", "", ord_dates),
                              format = "%B %e %Y"))
# Unit: seconds
#       expr      min       lq     mean   median       uq      max neval cld
#  lubridate 2.167957 2.219463 2.290950 2.252565 2.301725 2.587724    10   b
#       base 1.183970 1.224824 1.218642 1.227034 1.228324 1.229095    10  a

The obvious advantage in lubridate 's favor being its conciseness and flexibility. lubridate的明显优势是其简洁性和灵活性。

序数日期的格式（月份中的日期后缀 -st、-nd、-rd、-th）

问题描述

1 个解决方案

解决方案1
9 已采纳 2016-08-30 21:29:17

序数日期的格式（月份中的日期后缀 -st、-nd、-rd、-th）

问题描述

1 个解决方案

解决方案1 9 已采纳 2016-08-30 21:29:17

解决方案1
9 已采纳 2016-08-30 21:29:17