简体   繁体   English

聚合函数产生每日而不是每小时平均值

[英]aggregate function produces daily instead of hourly mean

I have a data.frame with 15 minute time steps in the first column and 16 more columns full of data. 我在第一列中有一个15分钟的时间步长的data.frame,在另外16列中充满了数据。 I want to get the hourly mean for each column. 我想获取每一列的每小时平均值。 I am using aggregate and it works perfectly fine for 1 min data. 我正在使用聚合,并且对于1分钟的数据来说效果很好。

mydata <- list()
for(j in colnames(data_frame)){
  data_mean <- aggregate(data_frame[j], 
                        list(hour=cut(as.POSIXct(data_frame$TIME), "hour")),
                        mean, na.rm=TRUE)
  mydata[[j]] <- data_mean
}

When I use this same setup for a 15 min data set it gives me the daily mean instead of the hourly mean. 当我使用相同的设置进行15分钟的数据设置时,它给出的是每日平均值而不是每小时平均值。 Any idea why? 知道为什么吗?

My data looks like this for 1 min data: 我的数据看起来像这样1分钟的数据:

"TIME","Tair","RH"
2016-01-01 00:01:00,5.9,82
2016-01-01 00:02:00,5.9,82
2016-01-01 00:03:00,5.9,82
2016-01-01 00:04:00,5.89,82
2016-01-01 00:05:00,5.8,82
2016-01-01 00:06:00,5.8,82
2016-01-01 00:07:00,5.8,82
2016-01-01 00:08:00,5.8,82
2016-01-01 00:09:00,5.8,82
2016-01-01 00:10:00,5.8,82
2016-01-01 00:11:00,5.8,82
2016-01-01 00:12:00,5.8,82
2016-01-01 00:13:00,5.8,82
2016-01-01 00:14:00,5.8,82
2016-01-01 00:15:00,5.8,82
2016-01-01 00:16:00,5.8,82
2016-01-01 00:17:00,5.8,82
2016-01-01 00:18:00,5.8,82
2016-01-01 00:19:00,5.8,82
2016-01-01 00:20:00,5.8,82
2016-01-01 00:21:00,5.75,82
2016-01-01 00:22:00,5.78,82
2016-01-01 00:23:00,5.78,83
2016-01-01 00:24:00,5.8,82
2016-01-01 00:25:00,5.73,82
2016-01-01 00:26:00,5.7,82
2016-01-01 00:27:00,5.7,82
2016-01-01 00:28:00,5.7,82
2016-01-01 00:29:00,5.7,82
2016-01-01 00:30:00,5.7,82
2016-01-01 00:31:00,5.7,83
2016-01-01 00:32:00,5.76,83
2016-01-01 00:33:00,5.8,83
2016-01-01 00:34:00,5.8,82
2016-01-01 00:35:00,5.8,82
2016-01-01 00:36:00,5.8,83
2016-01-01 00:37:00,5.79,83
2016-01-01 00:38:00,5.7,82

And for 15 min data: 对于15分钟的数据:

"TIME","Tair","RH"
2016-01-01 00:15:00,6.228442,80.40858
2016-01-01 00:30:00,6.121088,81.00000
2016-01-01 00:45:00,6.075000,NA
2016-01-01 01:00:00,5.951910,NA
2016-01-01 01:15:00,5.844144,NA
2016-01-01 01:30:00,5.802242,NA
2016-01-01 01:45:00,5.747619,NA
2016-01-01 02:00:00,5.742889,NA
2016-01-01 02:15:00,5.752584,81.12135
2016-01-01 02:30:00,5.677753,81.00000
2016-01-01 02:45:00,5.500224,81.61435
2016-01-01 03:00:00,5.225282,82.29797
2016-01-01 03:15:00,5.266441,83.00000
2016-01-01 03:30:00,5.200448,83.32584
2016-01-01 03:45:00,5.098876,84.00000
2016-01-01 04:00:00,5.081061,83.76894
2016-01-01 04:15:00,5.230769,82.88664
2016-01-01 04:30:00,5.300000,82.06742
2016-01-01 04:45:00,5.300000,NA
2016-01-01 05:00:00,5.399776,NA

Your code works for me. 您的代码对我有用。

However, your loop is slightly wasteful in that it repeatedly computes the cut of the TIME column for every column of the data.frame. 但是,您的循环有点浪费,因为它为data.frame的每一列重复计算TIME列的剪切。 You could precompute it, but there's a better solution. 您可以对其进行预先计算,但是有更好的解决方案。

You can produce the same result but in a simpler, more conventional, and more useful form with a single call to aggregate() : 您可以通过调用aggregate()以更简单,更常规,更有用的形式产生相同的结果:

aggregate(df1[names(df1)!='TIME'],list(hour=cut(df1$TIME,'hour')),mean,na.rm=T);
##         hour     Tair       RH
## 1 2016-01-01 5.786316 82.15789
aggregate(df15[names(df15)!='TIME'],list(hour=cut(df15$TIME,'hour')),mean,na.rm=T);
##                  hour     Tair       RH
## 1 2016-01-01 00:00:00 6.141510 80.70429
## 2 2016-01-01 01:00:00 5.836479      NaN
## 3 2016-01-01 02:00:00 5.668362 81.24523
## 4 2016-01-01 03:00:00 5.197762 83.15595
## 5 2016-01-01 04:00:00 5.227957 82.90767
## 6 2016-01-01 05:00:00 5.399776      NaN

Data 数据

df1 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:01:00','2016-01-01 00:02:00',
'2016-01-01 00:03:00','2016-01-01 00:04:00','2016-01-01 00:05:00','2016-01-01 00:06:00',
'2016-01-01 00:07:00','2016-01-01 00:08:00','2016-01-01 00:09:00','2016-01-01 00:10:00',
'2016-01-01 00:11:00','2016-01-01 00:12:00','2016-01-01 00:13:00','2016-01-01 00:14:00',
'2016-01-01 00:15:00','2016-01-01 00:16:00','2016-01-01 00:17:00','2016-01-01 00:18:00',
'2016-01-01 00:19:00','2016-01-01 00:20:00','2016-01-01 00:21:00','2016-01-01 00:22:00',
'2016-01-01 00:23:00','2016-01-01 00:24:00','2016-01-01 00:25:00','2016-01-01 00:26:00',
'2016-01-01 00:27:00','2016-01-01 00:28:00','2016-01-01 00:29:00','2016-01-01 00:30:00',
'2016-01-01 00:31:00','2016-01-01 00:32:00','2016-01-01 00:33:00','2016-01-01 00:34:00',
'2016-01-01 00:35:00','2016-01-01 00:36:00','2016-01-01 00:37:00','2016-01-01 00:38:00')),
Tair=c(5.9,5.9,5.9,5.89,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.8,5.75,
5.78,5.78,5.8,5.73,5.7,5.7,5.7,5.7,5.7,5.7,5.76,5.8,5.8,5.8,5.8,5.79,5.7),RH=c(82L,82L,82L,
82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,82L,83L,82L,82L,82L,
82L,82L,82L,82L,83L,83L,83L,82L,82L,83L,83L,82L));

df15 <- data.frame(TIME=as.POSIXct(c('2016-01-01 00:15:00','2016-01-01 00:30:00',
'2016-01-01 00:45:00','2016-01-01 01:00:00','2016-01-01 01:15:00','2016-01-01 01:30:00',
'2016-01-01 01:45:00','2016-01-01 02:00:00','2016-01-01 02:15:00','2016-01-01 02:30:00',
'2016-01-01 02:45:00','2016-01-01 03:00:00','2016-01-01 03:15:00','2016-01-01 03:30:00',
'2016-01-01 03:45:00','2016-01-01 04:00:00','2016-01-01 04:15:00','2016-01-01 04:30:00',
'2016-01-01 04:45:00','2016-01-01 05:00:00')),Tair=c(6.228442,6.121088,6.075,5.95191,
5.844144,5.802242,5.747619,5.742889,5.752584,5.677753,5.500224,5.225282,5.266441,5.200448,
5.098876,5.081061,5.230769,5.3,5.3,5.399776),RH=c(80.40858,81,NA,NA,NA,NA,NA,NA,81.12135,81,
81.61435,82.29797,83,83.32584,84,83.76894,82.88664,82.06742,NA,NA));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将每小时数据转换为每日最大值,最小值,平均值 - Convert hourly data to daily maximum, minimum, mean 使用 r 中的每小时数据计算每日平均值 - Calculate daily mean with hourly data in r 每个参数的每小时数据的每日平均值 - Daily mean of hourly data per parameter R每小时转换为每日数据,最多为0:00而不是23:00 - R convert hourly to daily data up to 0:00 instead of 23:00 找到一个更优雅的方法是使用Zoo将小时数据汇总为小时数据 - Finding a more elegant was to aggregate hourly data to mean hourly data using zoo R从完整的市场交易历史中找出每小时均值与平均每日价格的差异 - R finding the hourly mean difference from mean daily price from a complete history of market trades 将每小时堆叠的多元小时数据汇总为每日最大值,并在R中将data.table表示为平均值 - Aggregate stacked multivariate hourly data into daily maximum, and means in R with data.table 根据不遵循 0 - 24 的开始和结束的每小时数据计算平均每日降雨量 - Calculate mean daily rainfall from hourly data with start and end that doesn't follow 0 - 24 汇总功能以分组,计数和均值 - Aggregate function to group,count and mean 使用hourly.apply函数将10分钟数据聚合为每小时平均值 - Aggregating 10 minute data to hourly mean with the hourly.apply function fails
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM