简体   繁体   English

将日期转换为年月表示

[英]Convert Date to year month representation

I have a Date , and am interested in representing it as an integer of yyyymm form.我有一个Date ,并且有兴趣将它表示为yyyymm形式的整数。 Currently, I do:目前,我这样做:

get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from = as.Date("2012-01-01"), to = as.Date("5012-01-01"), by = 1) 
system.time(ym <- get_year_month(mydate))
#    user  system elapsed 
#    5.972   0.974   6.951 

This is very slow for large datasets.这对于大型数据集来说非常慢。 Is there a faster way?有没有更快的方法? Please provide timings for your answers so they can be easily compared.请提供回答的时间,以便于比较。 Use the above example.使用上面的例子。

Using functions from the lubridate package can be almost twice as fast as your function :使用lubridate包中的函数的速度几乎是您的函数的两倍:

mydate = as.Date(rep("2012-01-01",1000))
library(lubridate)
library(microbenchmark)
microbenchmark(get_year_month(mydate),
               year(mydate)*100+month(mydate))

gives :给出:

R> Unit: milliseconds
                               expr      min       lq   median       uq
             get_year_month(mydate) 2.150296 2.188370 2.218176 2.285973
 year(mydate) * 100 + month(mydate) 1.220016 1.228129 1.239704 1.284568

You can try using yearmon class from zoo package.您可以尝试使用zoo包中的yearmon类。 In general if you are doing timeseries manipulation and analysis, I would suggest using xts or atleast zoo class.一般来说,如果您正在进行时间序列操作和分析,我建议使用xts或至少zoo类。 xts has lot of functionality for analysis of very huge timeseries data. xts具有许多用于分析非常大的时间序列数据的功能。

Here is quick benchmark against other suggested solutions.这是针对其他建议解决方案的快速基准测试。

get_year_month <- function(d) {
    return(as.integer(format(d, "%Y%m")))
}
mydate = as.Date(rep("2012-01-01", 1e+06))

microbenchmark(get_year_month(mydate), year(mydate) * 100 + month(mydate), as.yearmon(mydate, format = "%Y-%m-%d"), times = 1)
## Unit: milliseconds
##                                     expr       min        lq    median        uq       max neval
##                   get_year_month(mydate) 1049.8813 1049.8813 1049.8813 1049.8813 1049.8813     1
##       year(mydate) * 100 + month(mydate)  434.1765  434.1765  434.1765  434.1765  434.1765     1
##  as.yearmon(mydate, format = "%Y-%m-%d")  249.6704  249.6704  249.6704  249.6704  249.6704     1

It would be best to keep your Dates in POSIXlt format if you want to manipulate them like that:如果您想像这样操作日期,最好将日期保持在POSIXlt格式:

> system.time(ym <- get_year_month(mydate))
   user  system elapsed 
  4.039   0.025   4.079 
> system.time(mydatep <- as.POSIXlt(mydate))
   user  system elapsed 
  3.576   0.016   3.603 
> system.time(ym <- (1900 + mydatep$year)*100 + (mydatep$mon + 1))
   user  system elapsed 
  0.010   0.005   0.015 

It's still a little faster, and you get subsequent similar operations for free, in terms of time.它仍然更快一点,并且您可以免费获得后续类似的操作,就时间而言。

There may not be a faster way for a single item.对于单个项目,可能没有更快的方法。 However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate eg但是,您可以使用内置复制(例如)使对集合运行的函数版本比线性运行速度快得多

function mydate(D) {
  x <- replicate(dim(D)[0], get_year_month(..)
  return(x)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM