[英]How to obtain hourly average of values in a time series data frame with multiple columns
I have a time series data with 3 columns with Dates,energy values and Station names. 我有3列的时间序列数据,其中包含日期,能量值和站点名称。 I want to obtain the hourly average of the energy values separately for each station. 我想分别获取每个站点的每小时能量平均值。
My data looks like this 我的数据看起来像这样
df df
Datetime Energy Station
1 2016-01-01 07:19:00 743.0253 Ajmer
2 2016-01-01 07:20:00 765.7225 Ajmer
3 2016-01-01 07:21:00 788.1493 Ajmer
4 2016-01-01 08:20:00 834.7815 Ajmer
5 2016-01-01 08:21:00 857.3012 Ajmer
6 2016-01-31 16:58:00 3427.098 Kotada
7 2016-01-31 16:59:00 3397.591 Kotada
8 2016-01-31 17:00:00 3344.149 Kotada
9 2016-01-31 17:01:00 3270.803 Kotada
Expected Output: 预期产量:
Datetime Energy Station
1. 2016-01-01 07:00:00 765.6324 Ajmer
2. 2016-01-01 08:00:00 846.0413 Ajmer
3. 2016-01-01 16:00:00 3412.345 Kotada
4. 2016-01-01 17:00:00 3307.476 Kotada
I tried group_by function to form a grouped data frame by Station names and then use the aggregate function to obtain the hourly average. 我尝试使用group_by函数按电台名称形成分组的数据帧,然后使用聚合函数获取每小时平均值。 But its not working. 但是它不起作用。
> byStn=df %>% group_by(Station)
> hour_byStn=byStn %>%
+ aggregate(energy,
+ list(hourtime = cut(Datetime, breaks="hour")),
+ mean, na.rm = TRUE)
I obtained the following error : Error in cut(Datetime, breaks = "hour") : object 'Datetime' not found. 我得到以下错误:cut(Datetime,breaks =“ hour”)中的错误:找不到对象'Datetime'。
Can you please tell me how to do this. 你能告诉我怎么做吗。 This is the first time I am working with time series data and dpylr package as well. 这也是我第一次使用时间序列数据和dpylr软件包。
We can use floor_date
from lubridate
to floor the 'DateTime' by hour
ly interval, use that in group_by
along with 'Station' and get the mean
of 'Energy' 我们可以使用floor_date
的lubridate
按hour
间隔将'DateTime' lubridate
为地板,在group_by
中将其与'Station'一起使用并获取'Energy'的mean
library(lubridate)
library(tidyverse)
df %>%
group_by(Datetime = floor_date(Datetime, "hour"), Station) %>%
summarise(Energy = mean(Energy, na.rm = TRUE))
# A tibble: 4 x 3
# Groups: Datetime [4]
# Datetime Station Energy
# <dttm> <chr> <dbl>
#1 2016-01-01 07:00:00 Ajmer 766.
#2 2016-01-01 08:00:00 Ajmer 846.
#3 2016-01-31 16:00:00 Kotada 3412.
#4 2016-01-31 17:00:00 Kotada 3307.
df <- structure(list(Datetime = structure(c(1451650740, 1451650800,
1451650860, 1451654400, 1451654460, 1454277480, 1454277540, 1454277600,
1454277660), class = c("POSIXct", "POSIXt"), tzone = ""), Energy = c(743.0253,
765.7225, 788.1493, 834.7815, 857.3012, 3427.098, 3397.591, 3344.149,
3270.803), Station = c("Ajmer", "Ajmer", "Ajmer", "Ajmer", "Ajmer",
"Kotada", "Kotada", "Kotada", "Kotada")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")
I haven't tested it but you want something along the lines of this... 我没有测试过,但是您想要一些类似的东西...
df %>%
mutate(hourtime = cut(Datetime, breaks='hour')) %>%
group_by(Station, hourtime) %>%
summarise(avg_energy = mean(Energy, na.rm = T))
I would suggest maybe reading up on some basic dplyr
syntax. 我建议也许阅读一些基本的dplyr
语法。 I referenced this religiously when I first started using it: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html 我第一次开始使用它时就虔诚地引用了它: https : //cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.