简体   繁体   English

使用lubridate在一列中格式化多个日期格式

[英]Format multiple date formats in one columns using lubridate

Sometimes I am given data sets that has two different date formats but common variables that have to been joined into one dataframe. 有时我会获得具有两种不同日期格式但必须连接到一个数据帧的常见变量的数据集。 Over the years, I've tried various solutions to get around this workflow hassle. 多年来,我尝试了各种解决方案来解决这个工作流程的麻烦。 Now that I've been using lubridate, it seems like many of these problems are easily solved. 现在我一直在使用lubridate,似乎很多这些问题都很容易解决。 However, I am encountering some behaviour that seems weird to me though I imagine there is a good explanation that is beyond me. 然而,我遇到了一些对我来说似乎很奇怪的行为,尽管我认为有一个很好的解释超出了我。 Say I am given a data set with different date formats that I join into one data frame. 假设我获得了一个具有不同日期格式的数据集,我将其加入到一个数据框中。 This dataframe looks like this: 此数据框如下所示:

library(ludridate)
library(dplyr)

df<-data.frame(Lab=c("A","B"),DATE=c("12/15/15","12/15/2013")); df

I want to convert this data to a date format with lubridate. 我想将此数据转换为使用lubridate的日期格式。 However the following does not format consistently: 但是,以下内容不一致:

df %>% 
  mutate(mdy(DATE))

...but rather creates a 0015 date. ...而是创建0015日期。 If I filter just for Lab "A": 如果我只过滤实验室“A”:

df %>% 
  filter(Lab=="A") %>%
  mutate(mdy(DATE))

... or even group_by Lab: ......甚至是group_by实验室:

df %>% 
  group_by(Lab) %>%
  mutate(mdy(DATE))

Then I get the desired year format. 然后我得到了所需的年份格式。 Is this the correct behaviour of the lubridate family of date formatting functions? 这是日期格式化函数的lubridate系列的正确行为吗? Is there a better way to accomplish what I am doing? 有没有更好的方法来完成我正在做的事情? I am sure that multiple date formats in one column is a relatively common (and annoying) occurence. 我确信一列中的多个日期格式是一个相对常见(并且令人讨厌)的出现。

Thanks in advance. 提前致谢。

From the help on parse_date_time: 从parse_date_time的帮助:

## ** how to use select_formats **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC"   "2013-09-27 UTC"

## to give priority to %y format, define your own select_format function:

my_select <-   function(trained){
  n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y",     names(trained))*1.5
  names(trained[ which.max(n_fmts) ])
}

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"

parse_date_time of lubridate package can help format multiple date formats in one go. parse_date_time包的lubridate可以帮助一次格式化多种日期格式。

Syntax: 句法:

df$date = parse_date_time(df$date, c(format1, format2, format3))

You need to specify all the possible format types. 您需要指定所有可能的格式类型。

Since lubridate has some difficulty understanding (correctly) some format types, you need to make custom format. 由于lubridate很难理解(正确)某些格式类型,因此您需要制作自定义格式。

In the help section , you will find the below illustration. 在帮助部分中,您将找到下图。 You can recreate it to suit your requirement. 您可以重新创建它以满足您的要求。

## ** how to use `select_formats` argument **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC"   "2013-09-27 UTC"

## to give priority to %y format, define your own select_format function:

my_select <-   function(trained){
   n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5
   names(trained[ which.max(n_fmts) ])
}

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM