繁体   English   中英

dataframe 中的日期标准化 R

[英]Date standardisation in a dataframe in R

在我的 dataframe 中,我有表格的日期

YYYY-MM-DD
YYYY.MM.DD
YYYY-MM-DD HH:MM
我想将其标准化为以下形式:
YYYY-MM-DD
在 R 中。

我已经尝试过 R 中的 parse_date_time() function,但并非所有列都被解析。 为什么呢? 任何帮助,将不胜感激:-)

编辑
的使用示例

parse_date_time(emails$mail_sent_date)

这是

[1] NA               "2010-04-17 UTC" "2012-01-26 UTC" "2014-11-15 UTC" "2014-07-17 UTC" "2010-02-22 UTC" "2010-07-17 UTC" "2012-10-27 UTC" "2014-01-18 UTC"
 [10] "2010-01-11 UTC" NA               "2010-11-28 UTC" "2012-09-24 UTC" "2014-05-30 UTC" "2014-05-30 UTC" "2010-07-31 UTC" "2007-07-28 UTC" NA              
 [19] "2014-08-29 UTC" "2015-06-05 UTC" "2008-11-03 UTC" "2018-03-18 UTC" "2019-01-12 UTC" "2011-07-23 UTC" NA               "2007-11-19 UTC" "2019-04-07 UTC"
 [28] "2010-11-28 UTC" "2019-11-22 UTC" "2019-03-28 UTC" "2013-06-22 UTC" "2013-12-08 UTC" "2012-06-08 UTC" "2011-12-09 UTC" "2017-10-23 UTC" "2017-03-26 UTC"
 [37] "2019-01-31 UTC" "2020-03-14 UTC" "2014-05-30 UTC" "2011-12-31 UTC" "2015-05-14 UTC" "2010-03-27 UTC" "2014-12-08 UTC" "2015-05-24 UTC" "2014-11-15 UTC"
 [46] NA               "2018-05-26 UTC" "2019-02-28 UTC" NA               "2015-06-11 UTC" "2012-06-09 UTC" "2013-06-16 UTC" NA               "2014-07-12 UTC"
 [55] "2012-09-20 UTC" "2010-05-22 UTC" "2019-11-07 UTC" "2011-03-07 UTC" "2007-10-05 UTC" "2018-03-17 UTC" "2007-06-22 UTC" "2007-02-01 UTC" "2020-03-29 UTC"
 [64] "2010-03-21 UTC" "2019-02-28 UTC" NA               "2008-03-17 UTC" "2013-03-14 UTC" "2014-05-12 UTC" "2015-12-19 UTC" "2010-04-05 UTC" NA              
 [73] "2008-02-07 UTC" "2007-08-12 UTC" "2011-12-02 UTC" "2014-02-02 UTC" "2011-07-25 UTC" "2014-06-12 UTC" NA               NA               "2013-10-06 UTC"
 [82] "2019-05-18 UTC" "2011-12-19 UTC" NA               "2012-03-18 UTC" "2013-07-22 UTC" "2017-01-21 UTC" "2013-09-26 UTC" "2019-04-18 UTC" "2012-10-01 UTC"
 [91] "2018-09-01 UTC" "2019-11-22 UTC" "2013-07-05 UTC" "2013-07-22 UTC" "2008-10-11 UTC" "2018-04-29 UTC" NA               "2019-06-24 UTC" "2018-04-19 UTC"
[100] "2015-08-21 UTC" NA               NA               "2015-04-09 UTC" "2012-02-11 UTC" "2011-11-13 UTC" "2013-04-11 UTC" "2007-10-07 UTC" "2007-10-08 UTC"
[109] "2012-01-14 UTC" "2012-06-02 UTC" "2011-07-04 UTC" "2019-05-17 UTC" "2012-09-09 UTC" NA               "2018-09-29 UTC" "2015-06-04 UTC" "2014-01-13 UTC"
[118] "2014-01-13 UTC" "2012-09-24 UTC" "2018-05-28 UTC" "2018-07-21 UTC" "2010-04-26 UTC" "2011-02-20 UTC" "2013-06-21 UTC" "2008-12-14 UTC" "2011-04-25 UTC"
[127] "2014-07-31 UTC" "2015-06-08 UTC" "2015-10-25 UTC" "2019-06-29 UTC" "2011-02-21 UTC" "2017-01-09 UTC" NA               "2015-06-21 UTC" "2014-07-28 UTC"
[136] "2013-11-04 UTC" "2014-07-24 UTC" NA               "2019-09-13 UTC" "2007-06-09 UTC" "2014-12-13 UTC" "2015-10-16 UTC" "2010-06-19 UTC" "2015-05-14 UTC"
[145] "2011-07-29 UTC" "2007-10-01 UTC" NA               NA               "2010-09-25 UTC" "2010-04-15 UTC" "2020-03-05 UTC" "2017-06-30 UTC" NA              
[154] "2019-06-10 UTC" "2018-10-04 UTC" "2015-05-11 UTC" "2010-05-22 UTC" "2014-07-26 UTC" "2015-01-25 UTC" "2015-07-04 UTC" "2015-07-04 UTC" "2014-07-17 UTC"
[163] "2010-09-18 UTC" "2007-01-08 UTC" "2019-10-21 UTC" "2014-06-30 UTC" "2008-08-01 UTC" NA               "2010-08-13 UTC" NA               NA              
[172] "2012-11-24 UTC" "2014-11-20 UTC" "2018-05-14 UTC" "2015-10-05 UTC" "2020-01-26 UTC" "2018-04-21 UTC" "2011-07-04 UTC" "2015-02-22 UTC" "2015-02-22 UTC"
[181] "2008-10-11 UTC" "2017-01-05 UTC" "2011-05-21 UTC" NA               "2015-09-27 UTC" "2011-08-28 UTC" "2019-03-09 UTC" "2018-11-29 UTC" "2014-07-11 UTC"
[190] "2013-06-14 UTC" "2018-06-04 UTC" "2014-11-03 UTC" "2019-03-01 UTC" "2007-10-12 UTC" "2018-01-06 UTC" NA               "2010-11-28 UTC" "2017-10-23 UTC"
[199] "2014-03-23 UTC" "2018-11-11 UTC" "2019-05-18 UTC" "2014-10-02 UTC" NA               NA               "2011-07-31 UTC" "2010-07-16 UTC" "2015-04-09 UTC"
[208] "2015-10-01 UTC" "2015-10-09 UTC" "2011-04-01 UTC" "2018-11-11 UTC" "2018-11-11 UTC" "2011-08-28 UTC" "2018-07-21 UTC" NA               "2011-02-21 UTC"
[217] "2018-03-17 UTC" NA               "2014-05-11 UTC" "2012-03-23 UTC" "2014-05-25 UTC" "2014-03-23 UTC" "2013-01-20 UTC" NA               "2014-07-11 UTC"
[226] "2014-09-08 UTC" "2013-05-24 UTC" NA               "2010-07-17 UTC" NA               "2019-01-01 UTC" NA               "2013-06-15 UTC" "2019-01-19 UTC"
[235] "2020-02-02 UTC" "2013-03-14 UTC" "2012-08-04 UTC" "2015-02-13 UTC" "2010-06-18 UTC" NA               "2013-10-20 UTC" "2015-12-17 UTC" "2017-09-01 UTC"
[244] "2013-03-28 UTC" "2010-04-01 UTC" "2017-07-24 UTC" "2007-09-30 UTC" "2017-05-27 UTC" NA               "2006-11-17 UTC" "2007-11-18 UTC" "2019-12-01 UTC"
[253] "2015-10-12 UTC" "2015-03-27 UTC" "2017-12-02 UTC" "2018-09-03 UTC" "2018-03-04 UTC" "2015-03-14 UTC" NA               "2010-01-25 UTC" "2008-07-04 UTC"
[262] "2015-04-29 UTC" "2013-04-05 UTC" NA               "2007-11-02 UTC" "2010-06-13 UTC" "2019-02-16 UTC" "2015-04-09 UTC" "2013-07-27 UTC" NA              
[271] "2018-08-25 UTC" "2019-06-14 UTC"
Warning message:
 39 failed to parse. 

当我使用 ymd() 时返回了类似的警告消息

1)假设格式正是问题中显示的格式(如果不是,请解决问题),那么这仅使用基础 R。 这利用了 as.Date 最后会忽略垃圾的事实。

x <- c("2000-10-01", "2000.10.01", "2000-10-01 03:04")

as.Date(chartr(".", "-", x))
## [1] "2000-10-01" "2000-10-01" "2000-10-01"

2)另一种方法是随时package:

library(anytime)

anydate(x)
## [1] "2000-10-01" "2000-10-01" "2000-10-01"

使用lubridate package 和ymd function

library(lubridate)
ymd(column_of_your_dataframe)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM