I have a dataset where I want to group by year (and sum over days
), but if the number of days
for a certain date
is more than the number of days that have occurred in the year so far, the extra days should be added to the previous year. For example, below, out of the 153 days associated with 2019-02-01
, 31 of the days should go towards 2019 and 122 should go towards 2018.
Data
dat <- data.frame(date = as.Date( c("2018-02-01", "2018-06-01", "2018-07-01", "2018-09-01", "2019-02-01", "2019-03-01", "2019-04-01") ),
days = c(0, 120, 30, 62, 153, 28, 31))
date days
2018-02-01 0
2018-06-01 120
2018-07-01 30
2018-09-01 62
2019-02-01 153
2019-03-01 28
2019-04-01 31
Expected output
year days
2018 334
2019 90
How can I do this in R? (ideally using dplyr
, but base-R is fine if that's the only way)
Here is one way using base R :
#Get day of the year
dat$day_in_year <- as.integer(format(dat$date, "%j"))
#Get year from date
dat$year <- as.integer(format(dat$date, "%Y"))
#Index where day in year is less than days
inds <- dat$day_in_year < dat$days
#Create a new dataframe with adjusted values
other_df <- data.frame(days = dat$days[inds] - dat$day_in_year[inds] + 1,
year = dat$year[inds] - 1)
#Update the original data
dat$days[inds] <- dat$day_in_year[inds] - 1
#Combine the two dataframe then aggregate
aggregate(days~year, rbind(dat[c('days', 'year')], other_df), sum)
# year days
#1 2018 334
#2 2019 90
A possible tidyverse
way:
library(tidyverse)
dat %>% group_by(year = as.integer(format(date, '%Y'))) %>%
mutate(excess = days - (date - as.Date(paste0(year, '-01-01'))),
days = ifelse(excess > 0, days - excess, days)) %>%
summarise(days = sum(days), excess = as.integer(sum(excess[excess > 0]))) %>%
ungroup %>%
complete(year = seq(min(year), max(year)), fill = list(excess = 0)) %>%
mutate(days = days + lead(excess, default = 0), excess = NULL)
Output:
# A tibble: 2 x 2
year days
<chr> <dbl>
1 2018 334
2 2019 90
Basically using tapply
, getting the year from the first four character substr
ing.
data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
# days
# 2018 212
# 2019 212
If the year is needed as column, probably better using aggregate
.
with(dat, aggregate(list(days=days), list(date=substr(date, 1, 4)), sum))
# date days
# 1 2018 212
# 2 2019 212
To get the transfer one year back, we could write a function fun
that subtracts, to get the transfers tr
.
fun <- function(d) d - as.Date(paste0(substr(d, 1, 4), "-01-01"))
tr <- with(dat, as.numeric(days - fun(date)))
tapply
solution:
res <- data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
transform(res, days=days + tr[tr > 0] * c(1, -1))
# days
# 2018 334
# 2019 90
Similar using aggregate
:
res2 <- with(dat, aggregate(list(days=days),
list(date=substr(date, 1, 4)), sum))
transform(res2, days=days + tr[tr > 0] * c(1, -1))
# date days
# 1 2018 334
# 2 2019 90
Data:
dat <- structure(list(date = structure(c(17563, 17683, 17713, 17775,
17928, 17956, 17987), class = "Date"), days = c(0, 120, 30, 62,
153, 28, 31)), class = "data.frame", row.names = c(NA, -7L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.