简体   繁体   English

如何将每 10 分钟间隔捕获的数据转换为 15 分钟间隔数据

[英]How to convert data captured every 10 min interval into 15 min interval data

I have a dataframe with below data ( Average of the values of timestamp 7.50 and 7.40 should be my value of A for time Stamp 7.45)我有一个包含以下数据的数据框(时间戳 7.50 和 7.40 的平均值应该是时间戳 7.45 的 A 值)

Date_Time     |         A
7/28/2017 8:00|     443.75
7/28/2017 7:50|     440.75
7/28/2017 7:45|     NA
7/28/2017 7:40|     447.5
7/28/2017 7:30|     448.75
7/28/2017 7:20|     444.5
7/28/2017 7:15|     NA
7/28/2017 7:10|     440.25
7/28/2017 7:00|     447.5

I want it to transform into 15 min interval something like below using mean:我希望它使用平均值转换为 15 分钟间隔,如下所示:

Date / Time   |    Object Value
7/28/2017 8:00|        465
7/28/2017 7:45|        464.875
7/28/2017 7:30|        464.75
7/28/2017 7:15|        464.875
7/28/2017 7:00|        465

Updat更新

The OP changes his or her desired output. OP 改变他或她想要的输出。 Since I have no time to update my answer, I will leave my answer as it is.由于我没有时间更新我的答案,我将保留我的答案。 See my comment in the original post to see how to use na.interpolation to fill in the missing values.请参阅我在原帖中的评论,了解如何使用na.interpolation来填充缺失值。

Original Post原帖

This solution assumes you calculated the average based on the average values in 8:00 , 7:30 , and 7:00 .此解决方案假定您根据8:007:307:00的平均值计算平均值。

library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)


dt2 <- dt %>%
  mutate(Date.Time = mdy_hm(Date.Time)) %>%
  filter(Date.Time %in% seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
  complete(Date.Time = seq(min(Date.Time), max(Date.Time), by = "15 min")) %>%
  mutate(Object.Value = na.interpolation(Object.Value)) %>%
  fill(Object.Name) %>%
  arrange(desc(Date.Time))

dt2
# A tibble: 5 x 3
            Date.Time Object.Name Object.Value
               <dttm>       <chr>        <dbl>
1 2017-07-28 08:00:00           a      465.000
2 2017-07-28 07:45:00           a      464.875
3 2017-07-28 07:30:00           a      464.750
4 2017-07-28 07:15:00           a      464.875
5 2017-07-28 07:00:00           a      465.000

Data数据

dt <- read.table(text = "'Date Time' 'Object Name' 'Object Value'
'7/28/2017 8:00' a 465
                 '7/28/2017 7:50' a 465
                 '7/28/2017 7:40' a 464.75
                 '7/28/2017 7:30' a 464.75
                 '7/28/2017 7:20' a 464.75
                 '7/28/2017 7:10' a 465
                 '7/28/2017 7:00' a 465",
                 header = TRUE, stringsAsFactors = FALSE)

If the values measured on the 10-minute intervals are time-integrated averages over that period, it's reasonable to average them to a different period.如果以 10 分钟为间隔测量的值是该时间段内的时间积分平均值,则将它们平均到不同的时间段是合理的。 If these are instantaneous measurements, then it's more reasonable to smooth them as others have suggested.如果这些是瞬时测量值,那么像其他人建议的那样平滑它们更合理。

To take time-integrated averages measured on the 10-minute schedule and average those to the 15-minute schedule, you can use the intervalaverage package:要获取在 10 分钟计划上测量的时间积分平均值并将其平均到 15 分钟计划,您可以使用 intervalaverage 包:


library(data.table)
library(intervalaverage)
x <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:50", "7/28/2017 7:45", 
                             "7/28/2017 7:40", "7/28/2017 7:30", "7/28/2017 7:20", "7/28/2017 7:15", 
                             "7/28/2017 7:10", "7/28/2017 7:00"), A = c(443.75, 440.75, NA, 
                                                                        447.5, 448.75, 444.5, NA, 440.25, 447.5)), row.names = c(NA, 
                                                                                                                                 -9L), class = "data.frame")
y <- structure(list(time = c("7/28/2017 8:00", "7/28/2017 7:45", "7/28/2017 7:30", 
                             "7/28/2017 7:15", "7/28/2017 7:00")), row.names = c(NA, -5L), class = "data.frame")

setDT(x)
setDT(y)

x
#>              time      A
#> 1: 7/28/2017 8:00 443.75
#> 2: 7/28/2017 7:50 440.75
#> 3: 7/28/2017 7:45     NA
#> 4: 7/28/2017 7:40 447.50
#> 5: 7/28/2017 7:30 448.75
#> 6: 7/28/2017 7:20 444.50
#> 7: 7/28/2017 7:15     NA
#> 8: 7/28/2017 7:10 440.25
#> 9: 7/28/2017 7:00 447.50
y
#>              time
#> 1: 7/28/2017 8:00
#> 2: 7/28/2017 7:45
#> 3: 7/28/2017 7:30
#> 4: 7/28/2017 7:15
#> 5: 7/28/2017 7:00

x[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(x, "time","start_time")
x[, start_time_integer:=as.integer(start_time)]

y[, time:=as.POSIXct(time,format='%m/%d/%Y %H:%M',tz = "UTC")]
setnames(y, "time","start_time")
y[, start_time_integer:=as.integer(start_time)]

setkey(y, start_time)
setkey(x, start_time)


##drop time times at 15 and 45
x <- x[!start_time %in% as.POSIXct(c("2017-07-28 07:45:00","2017-07-28 07:15:00"),tz="UTC")]


x[, end_time_integer:=as.integer(start_time)+60L*10L-1L]
x[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]


y[, end_time_integer:=as.integer(start_time)+60L*15L-1L]
y[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz = "UTC")]


x
#>             start_time      A start_time_integer end_time_integer
#> 1: 2017-07-28 07:00:00 447.50         1501225200       1501225799
#> 2: 2017-07-28 07:10:00 440.25         1501225800       1501226399
#> 3: 2017-07-28 07:20:00 444.50         1501226400       1501226999
#> 4: 2017-07-28 07:30:00 448.75         1501227000       1501227599
#> 5: 2017-07-28 07:40:00 447.50         1501227600       1501228199
#> 6: 2017-07-28 07:50:00 440.75         1501228200       1501228799
#> 7: 2017-07-28 08:00:00 443.75         1501228800       1501229399
#>               end_time
#> 1: 2017-07-28 07:09:59
#> 2: 2017-07-28 07:19:59
#> 3: 2017-07-28 07:29:59
#> 4: 2017-07-28 07:39:59
#> 5: 2017-07-28 07:49:59
#> 6: 2017-07-28 07:59:59
#> 7: 2017-07-28 08:09:59
y
#>             start_time start_time_integer end_time_integer            end_time
#> 1: 2017-07-28 07:00:00         1501225200       1501226099 2017-07-28 07:14:59
#> 2: 2017-07-28 07:15:00         1501226100       1501226999 2017-07-28 07:29:59
#> 3: 2017-07-28 07:30:00         1501227000       1501227899 2017-07-28 07:44:59
#> 4: 2017-07-28 07:45:00         1501227900       1501228799 2017-07-28 07:59:59
#> 5: 2017-07-28 08:00:00         1501228800       1501229699 2017-07-28 08:14:59
out <- intervalaverage(x,y,interval_vars=c("start_time_integer","end_time_integer"),value_vars="A")

out[, start_time:=as.POSIXct(start_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, end_time:=as.POSIXct(end_time_integer,origin="1969-12-31 24:00:00",tz="UTC")]
out[, list(start_time,end_time, A)]
#>             start_time            end_time        A
#> 1: 2017-07-28 07:00:00 2017-07-28 07:14:59 445.0833
#> 2: 2017-07-28 07:15:00 2017-07-28 07:29:59 443.0833
#> 3: 2017-07-28 07:30:00 2017-07-28 07:44:59 448.3333
#> 4: 2017-07-28 07:45:00 2017-07-28 07:59:59 443.0000
#> 5: 2017-07-28 08:00:00 2017-07-28 08:14:59       NA


#Note that this just equivalent to taking weighted.mean:

weighted.mean(c(447.5,440.25),w=c(10,5))
#> [1] 445.0833
weighted.mean(c(440.25,444.5),w=c(5,10))
#> [1] 443.0833
#etc

Note that the intervalaverage package requires integer columns defining closed intervals, hence the conversion to integer.请注意,intervalaverage 包需要定义闭区间的整数列,因此转换为整数。 integers are converted back to datetime (POSIXct) for readability.整数被转换回日期时间 (POSIXct) 以提高可读性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM