简体   繁体   English

在 R 中按年份分组

[英]Grouping by year in R

I have a dataset where I want to group by year (and sum over days ), but if the number of days for a certain date is more than the number of days that have occurred in the year so far, the extra days should be added to the previous year.我有一个数据集,我用一年(和求和希望将days ),但如果数days一定date比都发生在这一年,到目前为止的天数较多,应增加额外的天到上一年。 For example, below, out of the 153 days associated with 2019-02-01 , 31 of the days should go towards 2019 and 122 should go towards 2018.例如,在与2019-02-01相关的 153 天中, 2019-02-01 31 天应朝向 2019 年,122 天应朝向 2018 年。


dat <- data.frame(date = as.Date( c("2018-02-01", "2018-06-01", "2018-07-01", "2018-09-01", "2019-02-01", "2019-03-01", "2019-04-01") ),
                  days = c(0, 120, 30, 62, 153, 28, 31))

date         days
2018-02-01   0
2018-06-01   120
2018-07-01   30
2018-09-01   62
2019-02-01   153
2019-03-01   28
2019-04-01   31

Expected output预期输出

year   days
2018   334
2019   90

How can I do this in R?我怎样才能在 R 中做到这一点? (ideally using dplyr , but base-R is fine if that's the only way) (理想情况下使用dplyr ,但如果这是唯一的方法,则 base-R 很好)

Here is one way using base R :这是使用基数 R 的一种方法:

#Get day of the year
dat$day_in_year <- as.integer(format(dat$date, "%j"))
#Get year from date
dat$year <- as.integer(format(dat$date, "%Y"))
#Index where day in year is less than days
inds <- dat$day_in_year < dat$days
#Create a new dataframe with adjusted values
other_df <- data.frame(days = dat$days[inds] - dat$day_in_year[inds] + 1, 
                       year = dat$year[inds] - 1)
#Update the original data
dat$days[inds] <- dat$day_in_year[inds] - 1

#Combine the two dataframe then aggregate
aggregate(days~year, rbind(dat[c('days', 'year')], other_df), sum)

#  year days
#1 2018  334
#2 2019   90

A possible tidyverse way:一种可能的tidyverse方式:


dat %>% group_by(year = as.integer(format(date, '%Y'))) %>%
  mutate(excess = days - (date - as.Date(paste0(year, '-01-01'))),
    days = ifelse(excess > 0, days - excess, days)) %>%
  summarise(days = sum(days), excess = as.integer(sum(excess[excess > 0]))) %>%
  ungroup %>%
  complete(year = seq(min(year), max(year)), fill = list(excess = 0)) %>%
  mutate(days = days + lead(excess, default = 0), excess = NULL)


# A tibble: 2 x 2
  year   days
  <chr> <dbl>
1 2018    334
2 2019     90

Basically using tapply , getting the year from the first four character substr ing.基本上使用tapply ,从前四个字符子substr获取年份。

data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
#      days
# 2018  212
# 2019  212

If the year is needed as column, probably better using aggregate .如果需要年份作为列,使用aggregate可能更好。

with(dat, aggregate(list(days=days), list(date=substr(date, 1, 4)), sum))
#   date days
# 1 2018  212
# 2 2019  212

To get the transfer one year back, we could write a function fun that subtracts, to get the transfers tr .为了获得一年前的转账,我们可以编写一个函数fun进行减法运算,以获得转账tr

fun <- function(d) d - as.Date(paste0(substr(d, 1, 4), "-01-01"))
tr <- with(dat, as.numeric(days - fun(date)))

tapply solution: tapply解决方案:

res <- data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
transform(res, days=days + tr[tr > 0] * c(1, -1))

#      days
# 2018  334
# 2019   90

Similar using aggregate :类似使用aggregate

res2 <- with(dat, aggregate(list(days=days), 
                            list(date=substr(date, 1, 4)), sum))
transform(res2, days=days + tr[tr > 0] * c(1, -1))
#   date days
# 1 2018  334
# 2 2019   90


dat <- structure(list(date = structure(c(17563, 17683, 17713, 17775, 
17928, 17956, 17987), class = "Date"), days = c(0, 120, 30, 62, 
153, 28, 31)), class = "data.frame", row.names = c(NA, -7L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM