简体   繁体   中英

Aggregate Weekly Data in R

I am sure this is straight forward but I just cant seem to get it to work. I have a data frame that represents daily totals. I simply want to sum the totals by week, retaining a zero if a week is not represented. What is the best approach in R? In case it matters, I read the data in from a CSV and converted it to a date once in R.

Here is the structure of my data frame p1:

'data.frame':   407 obs. of  2 variables:
 $ date:Class 'Date'  num [1:407] 14335 14336 14337 14340 14341 ...
 $ amt : num  45 150 165 165 45 45 150 150 15 165 ...

and the first few...

> head(p1)
        date amt
1 2009-04-01  45
2 2009-04-02 150
3 2009-04-03 165
4 2009-04-06 165
5 2009-04-07  45
6 2009-04-08  45

Many thanks in advance.

One note: I saw one previous post but couldn't get it to work

A solution with the lubridate library:

library(lubridate)
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45
2009-05-15,45"
df <- read.csv(textConnection(Lines))

If you don't need 0 for missing weeks it's simple:

weeks <- week(df$date)
sums <- tapply(df$amt, weeks, sum)
# 14  15  16  20 
#360 210  45  45 

To put zeros for missing weeks:

span <- min(weeks):max(weeks)
out <- array(0, dim = length(span), dimnames = list(span))
out[dimnames(sums)[[1]]] <- sums
# 14  15  16  17  18  19  20 
#360 210  45   0   0   0  45 

Here is a solution that reads in the data, aggregates it by week and then fills in missing weeks with zero all in 3 lines of code. read.zoo reads it in assuming a header and a field separator of comma. It converts the first column to Date class and then transforms the date to the following Friday. The nextfri function that does this transformation taken from the zoo-quickref vignette in the zoo package. (If you want to have the end of week be a different day of the week just replace every 5 in the formula with another day number. The idea is that relative to the UNIX Epoch that d-4 falls on day of the week d where d=0 is Sunday, d=1 is Monda, ..., d=6 is Saturday so any multiple of 7 days from that also falls on day of the week d.) The read.zoo command also aggregates all points that have the same index (remember that we have transformed them to the last Friday of the week so all points in the same week will have the same Friday as their index now). The next command creates a zero width zoo object that has the weeks from the first to the last and merges that with the output of the read using fill = 0 so that the filled in weeks get that value.

Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45"
library(zoo)
nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4)
z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",", 
    FUN = as.Date, FUN2 = nextfri, aggregate = sum)
merge(z, zoo(, seq(min(time(z)), max(time(z)), 7)), fill = 0)

We used textConnection(Lines) above to make it self contained so that you can just copy this and paste it right into your session but in reality textConnection(Lines) would be replaced with the name of your file, eg "myfile.csv" .

For the input above the output would be the following zoo object:

2009-04-03 2009-04-10 2009-04-17 
       360          0        255

There are three vignettes that come with the zoo package that you might want to read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM