简体   繁体   English

在 R 中每小时拆分时间序列数据

[英]Split time series data hourly in R

I have time-series data sampled at 10 minutes rate.我有以 10 分钟的速率采样的时间序列数据。 I want to split it hour-wise, but to my surprise split.xts is not producing intended results.我想按小时拆分它,但令我惊讶的是split.xts没有产生预期的结果。 Steps used are:使用的步骤是:

library(xts)
set.seed(123)
Sys.setenv(TZ="Asia/Kolkata")
timeind <- seq(as.POSIXct("2017-01-20 00:00:00 IST"),
               as.POSIXct("2017-01-20 23:59:59 IST"),by="10 min") #for indexing
df <- xts(runif(length(timeind),30,50),timeind) #xts data frame 
split(df,"hours",k=1)

OUTPUT IS:输出是:

[[1]]
                        [,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684

[[2]]
                        [,1]
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881

Expected output is:预期输出为:

[[1]]
                        [,1]
2017-01-20 00:00:00 31.24343
2017-01-20 00:10:00 32.57921
2017-01-20 00:20:00 40.17684
2017-01-20 00:30:00 41.89185
2017-01-20 00:40:00 30.93997
2017-01-20 00:50:00 31.76651

[[2]]
2017-01-20 01:00:00 49.07364
2017-01-20 01:10:00 34.79113
2017-01-20 01:20:00 48.13881
...

Why split.xts is not working properly?为什么split.xts不能正常工作?

It's a known bug .这是一个已知的错误 If the index timezone happens to be one that is not a round hour offset from UTC, endpoints does not work correctly (because its calculations are based on UTC).如果索引时区恰好不是与 UTC 的整整小时偏移量,则endpoints无法正常工作(因为其计算基于 UTC)。

For example, Asia/Kolkata is UTC+0530, so endpoints aligns on half-hours.例如,亚洲/加尔各答是 UTC+0530,因此endpoints按半小时对齐。

A possible work-around would be to add 30 minutes to the index before calling split , then subtracting 30 minutes from each element of the result.一种可能的解决方法是在调用split之前向索引添加 30 分钟,然后从结果的每个元素中减去 30 分钟。 Though that might cause issues around daylight saving time, if the timezone observes one.虽然这可能会导致夏令时问题,但如果时区遵守一个。

df_adjusted <- df
.index(df_adjusted) <- .index(df_adjusted) - 60 * 30
by_hour <- lapply(split(df_adjusted, "hours"),
           function(x) { .index(x) <- .index(x) + 60 * 30; x })

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM