简体   繁体   中英

Grouping by year in R

I have a dataset where I want to group by year (and sum over days ), but if the number of days for a certain date is more than the number of days that have occurred in the year so far, the extra days should be added to the previous year. For example, below, out of the 153 days associated with 2019-02-01 , 31 of the days should go towards 2019 and 122 should go towards 2018.

Data

dat <- data.frame(date = as.Date( c("2018-02-01", "2018-06-01", "2018-07-01", "2018-09-01", "2019-02-01", "2019-03-01", "2019-04-01") ),
                  days = c(0, 120, 30, 62, 153, 28, 31))

date         days
2018-02-01   0
2018-06-01   120
2018-07-01   30
2018-09-01   62
2019-02-01   153
2019-03-01   28
2019-04-01   31

Expected output

year   days
2018   334
2019   90

How can I do this in R? (ideally using dplyr , but base-R is fine if that's the only way)

Here is one way using base R :

#Get day of the year
dat$day_in_year <- as.integer(format(dat$date, "%j"))
#Get year from date
dat$year <- as.integer(format(dat$date, "%Y"))
#Index where day in year is less than days
inds <- dat$day_in_year < dat$days
#Create a new dataframe with adjusted values
other_df <- data.frame(days = dat$days[inds] - dat$day_in_year[inds] + 1, 
                       year = dat$year[inds] - 1)
#Update the original data
dat$days[inds] <- dat$day_in_year[inds] - 1

#Combine the two dataframe then aggregate
aggregate(days~year, rbind(dat[c('days', 'year')], other_df), sum)

#  year days
#1 2018  334
#2 2019   90

A possible tidyverse way:

library(tidyverse)

dat %>% group_by(year = as.integer(format(date, '%Y'))) %>%
  mutate(excess = days - (date - as.Date(paste0(year, '-01-01'))),
    days = ifelse(excess > 0, days - excess, days)) %>%
  summarise(days = sum(days), excess = as.integer(sum(excess[excess > 0]))) %>%
  ungroup %>%
  complete(year = seq(min(year), max(year)), fill = list(excess = 0)) %>%
  mutate(days = days + lead(excess, default = 0), excess = NULL)

Output:

# A tibble: 2 x 2
  year   days
  <chr> <dbl>
1 2018    334
2 2019     90

Basically using tapply , getting the year from the first four character substr ing.

data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
#      days
# 2018  212
# 2019  212

If the year is needed as column, probably better using aggregate .

with(dat, aggregate(list(days=days), list(date=substr(date, 1, 4)), sum))
#   date days
# 1 2018  212
# 2 2019  212

To get the transfer one year back, we could write a function fun that subtracts, to get the transfers tr .

fun <- function(d) d - as.Date(paste0(substr(d, 1, 4), "-01-01"))
tr <- with(dat, as.numeric(days - fun(date)))

tapply solution:

res <- data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
transform(res, days=days + tr[tr > 0] * c(1, -1))

#      days
# 2018  334
# 2019   90

Similar using aggregate :

res2 <- with(dat, aggregate(list(days=days), 
                            list(date=substr(date, 1, 4)), sum))
transform(res2, days=days + tr[tr > 0] * c(1, -1))
#   date days
# 1 2018  334
# 2 2019   90

Data:

dat <- structure(list(date = structure(c(17563, 17683, 17713, 17775, 
17928, 17956, 17987), class = "Date"), days = c(0, 120, 30, 62, 
153, 28, 31)), class = "data.frame", row.names = c(NA, -7L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM