简体   繁体   English

在 R 中按月汇总行

[英]Summing rows by month in R

So I have a data frame that has a date column, an hour column and a series of other numerical columns.所以我有一个数据框,它有一个日期列、一个小时列和一系列其他数字列。 Each row in the data frame is 1 hour of 1 day for an entire year.数据框中的每一行是一整年的一天中的 1 小时。

The data frame looks like this:数据框如下所示:

          Date  Hour  Melbourne  Southern  Flagstaff
1   2009-05-01     0          0         5         17
2   2009-05-01     2          0         2          1
3   2009-05-01     1          0        11          0
4   2009-05-01     3          0         3          8
5   2009-05-01     4          0         1          0
6   2009-05-01     5          0        49         79
7   2009-05-01     6          0       425        610

The hours are out of order because this is subsetted from another data frame.小时数是乱序的,因为这是从另一个数据帧中提取的子集。

I would like to sum the values in the numerical columns by month and possibly by day.我想按月和可能按天对数字列中的值求和。 Does anyone know how I can do this?有谁知道我怎么能做到这一点?

I create the data set by我创建的数据集

data <- read.table( text="   Date    Hour    Melbourne   Southern    Flagstaff
                       1   2009-05-01  0   0   5   17
                       2   2009-05-01  2   0   2   1
                       3   2009-05-01  1   0   11  0
                       4   2009-05-01  3   0   3   8
                       5   2009-05-01  4   0   1   0
                       6   2009-05-01  5   0   49  79
                       7   2009-05-01  6   0   425 610",
                    header=TRUE,stringsAsFactors=FALSE)

You can do the summation with the function aggregate :您可以使用函数aggregate进行求和:

byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
             data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
             data=data,FUN=sum)

Look at ?aggregate to understand the function better.查看?aggregate以更好地理解该函数。 Starting with the last argument (because that makes explaining easier) the arguments do the following:从最后一个参数开始(因为这使解释更容易),参数执行以下操作:

  • FUN is the function that should be used for the aggregation. FUN是应该用于聚合的函数。 I use sum to sum up the values, but i could also be mean , max or some function you wrote yourself.我使用sum来总结这些值,但我也可以meanmax或您自己编写的某些函数。
  • data is used to indicate that data frame that I want to aggregate. data用于指示我要聚合的数据框。
  • The first argument tells the function what exactly I want to aggregate.第一个参数告诉函数我到底想要聚合什么。 On the left side of the ~ , I indicate the variables I want to aggregate.~的左侧,我指示要聚合的变量。 If there is more than one, they are combined with cbind .如果有多个,则将它们与cbind结合使用。 On the right hand side is the variable by which the data should be split.右侧是数据应该被分割的变量。 Putting Date means that aggregate will sum up the variables for each distinct value of Date .放置Date意味着聚合将对Date每个不同值的变量求和。

For the aggregation by month, I used the function month from the package lubridate .对于按月聚合,我使用了包lubridate的函数month It does what one expects: it returns a numeric value indicating the month for a given date.它做人们所期望的:它返回一个数值,指示给定日期的月份。 Maybe you first need to install the package by install.packages("lubridate") .也许您首先需要通过install.packages("lubridate")安装包。

If you prefer not to use lubridate, you could do the following instead:如果您不想使用 lubridate,则可以执行以下操作:

data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
                     data=data,FUN=sum)

Here I added a new column to data that contains the month and then aggregated by that column.在这里,我向包含月份的数据添加了一个新列,然后按该列聚合。

This could be another way to do this using data.table这可能是使用data.table执行此操作的另一种方法

library(data.table)
# Edited as per Arun's comment
out = setDT(data)[, lapply(.SD, sum), by=Date] 

#>out
#         Date Hour Melbourne Southern Flagstaff
#1: 2009-05-01   21         0      496       715

or by using dplyr或使用dplyr

library(dplyr)
out = data %>% group_by(Date) %>% summarise_each(funs(sum))

#>out
#Source: local data frame [1 x 5]
#        Date Hour Melbourne Southern Flagstaff
#1 2009-05-01   21         0      496       715

Another base R solution另一个基本的 R 解决方案

# to sum by date
rowsum(dat[-1], dat$Date)
#           Hour Melbourne Southern Flagstaff
#2009-05-01   21         0      496       715

# or by month and year
rowsum(dat[-1], format(dat$Date, "%b-%y") )
#       Hour Melbourne Southern Flagstaff
#May-09   21         0      496       715

我会使用 dplyr::summarize 和 group_by,并对每个数字列求和:

summarize(group_by(df, Date), m_count = sum(Melbourne), s_count = sum(Southern), f_count = sum(Flagstaff)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM