简体   繁体   English

在 R 中将时间序列数据从半小时减少到每小时

[英]Reducing time series data from half hour to hourly in R

I am working with smart meter data which is in half-hourly resolution.我正在处理半小时分辨率的智能电表数据。 Due to the sheer volume of data I am trying to reduce from half-hourly resolution to hourly resolution.由于数据量庞大,我试图将每半小时的分辨率降低到每小时的分辨率。 In doing so I am attempting to sum the consumption between two half-hourly measurements.这样做时,我试图对两次每半小时测量的消耗量求和。 The issue is I also have catagorical data in my data frame which I lose when using xts.问题是我的数据框中也有分类数据,在使用 xts 时会丢失这些数据。 This is what my data looks like:这是我的数据的样子:

> head(test1)
      LCLid stdorToU            DateTime KWH.hh..per.half.hour.   Acorn Acorn_grouped
1 MAC000002      Std 2012-10-12 00:30:00                      0 ACORN-A      Affluent
2 MAC000002      Std 2012-10-12 01:00:00                      0 ACORN-A      Affluent
3 MAC000002      Std 2012-10-12 01:30:00                      0 ACORN-A      Affluent
4 MAC000002      Std 2012-10-12 02:00:00                      0 ACORN-A      Affluent
5 MAC000002      Std 2012-10-12 02:30:00                      0 ACORN-A      Affluent
6 MAC000002      Std 2012-10-12 03:00:00                      0 ACORN-A      Affluent

Here is the code I have been attempting to use and the result I get.这是我一直在尝试使用的代码和我得到的结果。

test1 <- read.csv("test.csv", stringsAsFactors = F)
test1$DateTime <- ymd_hms(test1$DateTime)
test1$KWH.hh..per.half.hour. <- as.numeric(test1$KWH.hh..per.half.hour.)
test2 <- xts(test1$KWH.hh..per.half.hour., test1$DateTime)
head(test2)
period.apply(test2, endpoints(test2, "hours"), sum)

> period.apply(test2, endpoints(test2, "hours"), sum)
                     [,1]
2012-10-12 00:30:00 0.000
2012-10-12 01:30:00 0.000
2012-10-12 02:30:00 0.000
2012-10-12 03:30:00 0.000
2012-10-12 04:30:00 0.000
2012-10-12 05:30:00 0.000
2012-10-12 06:30:00 0.000
2012-10-12 07:30:00 0.000
2012-10-12 08:30:00 0.000
2012-10-12 09:30:00 0.000
2012-10-12 10:30:00 0.000

Ideally, I need a data set exactly as my original (test1), just half the size aggregated to hourly frequency rather than half-hourly.理想情况下,我需要一个与我的原始数据集 (test1) 完全相同的数据集,只是聚合到每小时频率而不是半小时频率的一半。 Can someone please help.有人可以帮忙吗。

Thanks谢谢

You need to create a grouping column, and then sum by group.您需要创建一个分组列,然后按组求和。

# create grouped column
test1$grouped_time = lubridate::floor_date(test1$DateTime, unit = "hour")
# (use ceiling_date instead if you want to round the half hours up instead of down)

# sum by group
library(dplyr)
test2 = test1 %>%
  group_by(grouped_time, LCLid, stdorToU, Acorn, Acorn_grouped) %>%
  summarize(KWH.hh.per.hour = sum(KWH.hh..per.half.hour.))

There are many alternatives to dplyr at the Sum by Group R-FAQ , in case you want to look at more options.如果您想查看更多选项,则在Sum by Group R-FAQ 中有许多替代dplyr的选项。

Note that this will sum the KWH column for each unique combination of the other columns in group_by() .请注意,这将对group_by()其他列的每个唯一组合的 KWH 列求和。 If some of those can change, like if stdorToU or the ACORN values might change from an hour to the next half hour but you still want the rows combined, you need to move that column out of group_by and into summarize , and specify which value to keep, eg如果一些人可以改变,就像如果stdorToUACORN值可能从一个小时改变到下一个半小时,但你还是要行结合起来,你需要移动柱而出的group_by进入summarize ,并指定其价值保持,例如

# if ACORN can change and you want to keep the first one
test2 = test1 %>%
  group_by(grouped_time, LCLid, stdorToU, Acorn_grouped) %>%
  summarize(KWH.hh.per.hour = sum(KWH.hh..per.half.hour.),
            ACORN = first(ACORN))
> head(sm_2013_tof)
# A tibble: 6 x 6
# Groups:   grouped_time, LCLid, stdorToU, Acorn [6]
  grouped_time        LCLid     stdorToU Acorn   Acorn_grouped KWH.hh.per.hour
  <dttm>              <chr>     <chr>    <chr>   <chr>                   <dbl>
1 2013-01-01 00:00:00 MAC000146 ToU      ACORN-L Adversity               0.155
2 2013-01-01 00:00:00 MAC000147 ToU      ACORN-F Comfortable             0.276
3 2013-01-01 00:00:00 MAC000158 ToU      ACORN-H Comfortable             0.152
4 2013-01-01 00:00:00 MAC000165 ToU      ACORN-E Affluent                0.401
5 2013-01-01 00:00:00 MAC000170 ToU      ACORN-F Comfortable             0.64 
6 2013-01-01 00:00:00 MAC000173 ToU      ACORN-E Affluent                0.072
> 

here is the now hourly data after grouping.这是分组后的每小时数据。

If i make this as.data.frame you see the 00:00:00 disappears如果我将其设为 as.data.frame,您会看到 00:00:00 消失了

sm_short_2013 <- as.data.frame(sm_2013_tof)

> head(sm_short_2013)
  grouped_time     LCLid stdorToU   Acorn Acorn_grouped KWH.hh.per.hour
1   2013-01-01 MAC000146      ToU ACORN-L     Adversity           0.155
2   2013-01-01 MAC000147      ToU ACORN-F   Comfortable           0.276
3   2013-01-01 MAC000158      ToU ACORN-H   Comfortable           0.152
4   2013-01-01 MAC000165      ToU ACORN-E      Affluent           0.401
5   2013-01-01 MAC000170      ToU ACORN-F   Comfortable           0.640
6   2013-01-01 MAC000173      ToU ACORN-E      Affluent           0.072
> dput(droplevels(sm_short_2013[1:10, ]))
structure(list(grouped_time = structure(c(1356998400, 1356998400, 
1356998400, 1356998400, 1356998400, 1356998400, 1356998400, 1356998400, 
1356998400, 1356998400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    LCLid = c("MAC000146", "MAC000147", "MAC000158", "MAC000165", 
    "MAC000170", "MAC000173", "MAC000186", "MAC000187", "MAC000193", 
    "MAC000194"), stdorToU = c("ToU", "ToU", "ToU", "ToU", "ToU", 
    "ToU", "ToU", "ToU", "ToU", "ToU"), Acorn = c("ACORN-L", 
    "ACORN-F", "ACORN-H", "ACORN-E", "ACORN-F", "ACORN-E", "ACORN-E", 
    "ACORN-L", "ACORN-D", "ACORN-D"), Acorn_grouped = c("Adversity", 
    "Comfortable", "Comfortable", "Affluent", "Comfortable", 
    "Affluent", "Affluent", "Adversity", "Affluent", "Affluent"
    ), KWH.hh.per.hour = c(0.155, 0.276, 0.152, 0.401, 0.64, 
    0.072, 0.407, 0.554, 0.725, 0.158)), row.names = c(NA, 10L
), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM