[英]Grouping by year in R
I have a dataset where I want to group by year (and sum over days
), but if the number of days
for a certain date
is more than the number of days that have occurred in the year so far, the extra days should be added to the previous year.我有一个数据集,我用一年(和求和希望将
days
),但如果数days
一定date
比都发生在这一年,到目前为止的天数较多,应增加额外的天到上一年。 For example, below, out of the 153 days associated with 2019-02-01
, 31 of the days should go towards 2019 and 122 should go towards 2018.例如,在与
2019-02-01
相关的 153 天中, 2019-02-01
31 天应朝向 2019 年,122 天应朝向 2018 年。
Data数据
dat <- data.frame(date = as.Date( c("2018-02-01", "2018-06-01", "2018-07-01", "2018-09-01", "2019-02-01", "2019-03-01", "2019-04-01") ),
days = c(0, 120, 30, 62, 153, 28, 31))
date days
2018-02-01 0
2018-06-01 120
2018-07-01 30
2018-09-01 62
2019-02-01 153
2019-03-01 28
2019-04-01 31
Expected output预期输出
year days
2018 334
2019 90
How can I do this in R?我怎样才能在 R 中做到这一点? (ideally using
dplyr
, but base-R is fine if that's the only way) (理想情况下使用
dplyr
,但如果这是唯一的方法,则 base-R 很好)
Here is one way using base R :这是使用基数 R 的一种方法:
#Get day of the year
dat$day_in_year <- as.integer(format(dat$date, "%j"))
#Get year from date
dat$year <- as.integer(format(dat$date, "%Y"))
#Index where day in year is less than days
inds <- dat$day_in_year < dat$days
#Create a new dataframe with adjusted values
other_df <- data.frame(days = dat$days[inds] - dat$day_in_year[inds] + 1,
year = dat$year[inds] - 1)
#Update the original data
dat$days[inds] <- dat$day_in_year[inds] - 1
#Combine the two dataframe then aggregate
aggregate(days~year, rbind(dat[c('days', 'year')], other_df), sum)
# year days
#1 2018 334
#2 2019 90
A possible tidyverse
way:一种可能的
tidyverse
方式:
library(tidyverse)
dat %>% group_by(year = as.integer(format(date, '%Y'))) %>%
mutate(excess = days - (date - as.Date(paste0(year, '-01-01'))),
days = ifelse(excess > 0, days - excess, days)) %>%
summarise(days = sum(days), excess = as.integer(sum(excess[excess > 0]))) %>%
ungroup %>%
complete(year = seq(min(year), max(year)), fill = list(excess = 0)) %>%
mutate(days = days + lead(excess, default = 0), excess = NULL)
Output:输出:
# A tibble: 2 x 2
year days
<chr> <dbl>
1 2018 334
2 2019 90
Basically using tapply
, getting the year from the first four character substr
ing.基本上使用
tapply
,从前四个字符子substr
获取年份。
data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
# days
# 2018 212
# 2019 212
If the year is needed as column, probably better using aggregate
.如果需要年份作为列,使用
aggregate
可能更好。
with(dat, aggregate(list(days=days), list(date=substr(date, 1, 4)), sum))
# date days
# 1 2018 212
# 2 2019 212
To get the transfer one year back, we could write a function fun
that subtracts, to get the transfers tr
.为了获得一年前的转账,我们可以编写一个函数
fun
进行减法运算,以获得转账tr
。
fun <- function(d) d - as.Date(paste0(substr(d, 1, 4), "-01-01"))
tr <- with(dat, as.numeric(days - fun(date)))
tapply
solution: tapply
解决方案:
res <- data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
transform(res, days=days + tr[tr > 0] * c(1, -1))
# days
# 2018 334
# 2019 90
Similar using aggregate
:类似使用
aggregate
:
res2 <- with(dat, aggregate(list(days=days),
list(date=substr(date, 1, 4)), sum))
transform(res2, days=days + tr[tr > 0] * c(1, -1))
# date days
# 1 2018 334
# 2 2019 90
Data:数据:
dat <- structure(list(date = structure(c(17563, 17683, 17713, 17775,
17928, 17956, 17987), class = "Date"), days = c(0, 120, 30, 62,
153, 28, 31)), class = "data.frame", row.names = c(NA, -7L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.