[英]Convert Date to year month representation
I have a Date
, and am interested in representing it as an integer of yyyymm
form.我有一个
Date
,并且有兴趣将它表示为yyyymm
形式的整数。 Currently, I do:目前,我这样做:
get_year_month <- function(d) { return(as.integer(format(d, "%Y%m")))}
mydate = seq.Date(from = as.Date("2012-01-01"), to = as.Date("5012-01-01"), by = 1)
system.time(ym <- get_year_month(mydate))
# user system elapsed
# 5.972 0.974 6.951
This is very slow for large datasets.这对于大型数据集来说非常慢。 Is there a faster way?
有没有更快的方法? Please provide timings for your answers so they can be easily compared.
请提供回答的时间,以便于比较。 Use the above example.
使用上面的例子。
Using functions from the lubridate
package can be almost twice as fast as your function :使用
lubridate
包中的函数的速度几乎是您的函数的两倍:
mydate = as.Date(rep("2012-01-01",1000))
library(lubridate)
library(microbenchmark)
microbenchmark(get_year_month(mydate),
year(mydate)*100+month(mydate))
gives :给出:
R> Unit: milliseconds
expr min lq median uq
get_year_month(mydate) 2.150296 2.188370 2.218176 2.285973
year(mydate) * 100 + month(mydate) 1.220016 1.228129 1.239704 1.284568
You can try using yearmon
class from zoo
package.您可以尝试使用
zoo
包中的yearmon
类。 In general if you are doing timeseries manipulation and analysis, I would suggest using xts
or atleast zoo
class.一般来说,如果您正在进行时间序列操作和分析,我建议使用
xts
或至少zoo
类。 xts
has lot of functionality for analysis of very huge timeseries data. xts
具有许多用于分析非常大的时间序列数据的功能。
Here is quick benchmark against other suggested solutions.这是针对其他建议解决方案的快速基准测试。
get_year_month <- function(d) {
return(as.integer(format(d, "%Y%m")))
}
mydate = as.Date(rep("2012-01-01", 1e+06))
microbenchmark(get_year_month(mydate), year(mydate) * 100 + month(mydate), as.yearmon(mydate, format = "%Y-%m-%d"), times = 1)
## Unit: milliseconds
## expr min lq median uq max neval
## get_year_month(mydate) 1049.8813 1049.8813 1049.8813 1049.8813 1049.8813 1
## year(mydate) * 100 + month(mydate) 434.1765 434.1765 434.1765 434.1765 434.1765 1
## as.yearmon(mydate, format = "%Y-%m-%d") 249.6704 249.6704 249.6704 249.6704 249.6704 1
It would be best to keep your Dates in POSIXlt
format if you want to manipulate them like that:如果您想像这样操作日期,最好将日期保持在
POSIXlt
格式:
> system.time(ym <- get_year_month(mydate))
user system elapsed
4.039 0.025 4.079
> system.time(mydatep <- as.POSIXlt(mydate))
user system elapsed
3.576 0.016 3.603
> system.time(ym <- (1900 + mydatep$year)*100 + (mydatep$mon + 1))
user system elapsed
0.010 0.005 0.015
It's still a little faster, and you get subsequent similar operations for free, in terms of time.它仍然更快一点,并且您可以免费获得后续类似的操作,就时间而言。
There may not be a faster way for a single item.对于单个项目,可能没有更快的方法。 However you can make a version of the function that operates on collections run much faster than linearly by using builtin replicate eg
但是,您可以使用内置复制(例如)使对集合运行的函数版本比线性运行速度快得多
function mydate(D) {
x <- replicate(dim(D)[0], get_year_month(..)
return(x)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.