简体   繁体   English

通过两个参数的功能

[英]SummaryBy function on two arguments

I have some data going from 1971 until 2099. Organize as follow: 我有从1971年到2099年的一些数据。组织如下:

YEAR;MONTH;DAY;RES1;RES2
1971;1;1;1206.1;627
1971;1;2;1303.4;654.3
1971;1;3;1248.9;662
1971;1;4;1188.8;666.8
1971;1;5;1055.2;667.8
1971;1;6;987.1;663.3
1971;1;7;939.2;655.1
1971;1;8;883.2;644.4
1971;1;9;844.1;632.6
1971;1;10;813.2;620.7
1971;1;11;786.4;609
1971;1;12;765.9;598.2
1971;1;13;990.2;650.1
1971;1;14;1374.4;698.9
1971;1;15;1335.9;718
1971;1;16;1193.2;721.6
1971;1;17;1043.5;719.5
1971;1;18;995.7;710.9
1971;1;19;937.2;696.2
1971;1;20;877;678.2
1971;1;21;880.2;676.5
1971;1;22;1227.2;715.3
1971;1;23;1275.7;731.1
1971;1;24;1029.2;730.7
1971;1;25;934.2;724.9
1971;1;26;923.6;714.8
1971;1;27;887.6;700.1
1971;1;28;840.2;682.6
1971;1;29;791.7;664.3
1971;1;30;746.7;646.4
1971;1;31;706.8;629.3

Using this data I need to calculate several average values such as the monthly average value. 使用此数据,我需要计算多个平均值,例如每月平均值。 In order to calculate the monthly average value I used the summaryBy function of the DoBy package. 为了计算月平均值,我使用了DoBy软件包的summaryBy函数。 The following code provide me the Monthly average value: 以下代码为我提供了每月的平均值:

indREF=which(data$YEAR > 1974 & data$YEAR < 2005)
indEND=which(data$YEAR > 2069)
dataREF=data[indREF,]
dataEND=data[indEND,]
MoyRef=c(summaryBy(dataREF[,"MONTH"]~MONTH, dataREF, FUN = function(x) {return(mean(x,na.rm=TRUE))})[,1])
MoyEnd=c(summaryBy(dataEND[,"MONTH"]~MONTH, dataEND, FUN = function(x) {return(mean(x,na.rm=TRUE))})[,1])

for ( i in 4:dim(data)[2])
{
  MoyRef=cbind(MoyRef,summaryBy(dataREF[,i]~MONTH, dataREF, FUN = function(x) {return(mean(x,na.rm=TRUE))})[,2])
  MoyEnd=cbind(MoyEnd,summaryBy(dataEND[,i]~MONTH, dataEND, FUN = function(x) {return(mean(x,na.rm=TRUE))})[,2])
}

But now, considering the fact that the data is going from 1971 until 2099 using a daily time step, I would like to calculate the daily average value of the data such as the output looks like the following: 但是现在,考虑到数据使用每日时间步长从1971年到2099年的事实,我想计算数据的每日平均值,例如输出如下所示:

MONTH;DAY;AVERAGE_RES1;AVERAGE_RES2
01;01;VALUE1;VALUE2
01;02;VALUE3;VALUE4
...
12;31;VALUEx;VALUEx

Does anyone has any idea about how to achieve that? 有谁知道如何实现这一目标?

Unfortunately, sample data is not suitable for testing, as it contains only january and only one year, so not much to compute means. 不幸的是,样本数据不适合测试,因为它仅包含1月且仅包含一年,因此计算平均值不足。 However, this should do the job: 但是,这应该可以完成以下工作:

aggregate(data[c("RES1", "RES2")], by = list(data$MONTH, data$DAY), FUN = "mean")

I think you should try this with dplyr package like this 我认为你应该像这样的dplyr包尝试

library(dplyr)
df %>% group_by(MONTH,DAY) %>% summarise_each_(funs(mean),c("RES1","RES2"))

A dplyr answer has been posted and probably a data.table answer will follow soon. dplyr答案已经发布,可能很快就会有data.table答案。 I still stand by my "R without packages" answer using aggregate() . 我仍然支持使用aggregate() “无包R”答案。 Even though dplyr and data.table obviously have their justification, I like the idea of sqldf: You learn SQL syntax once and then can use it for the rest of your life, whilst other languages and packages come and go, SQL, like basic R, are here to stay. 尽管dplyr和data.table显然有其合理性,但我还是喜欢sqldf的思想:您只需学习一次SQL语法,然后就可以在余生中使用它,而其他语言和程序包则像SQL Server R这样来来去去。 ,在这里停留。 Thus: 从而:

library(sqldf)
sqldf("SELECT DAY, MONTH, AVG(RES1), AVG(RES2) FROM data GROUP BY MONTH, DAY")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM