[英]Grouping by year in R
我有一個數據集,我用一年(和求和希望將days
),但如果數days
一定date
比都發生在這一年,到目前為止的天數較多,應增加額外的天到上一年。 例如,在與2019-02-01
相關的 153 天中, 2019-02-01
31 天應朝向 2019 年,122 天應朝向 2018 年。
數據
dat <- data.frame(date = as.Date( c("2018-02-01", "2018-06-01", "2018-07-01", "2018-09-01", "2019-02-01", "2019-03-01", "2019-04-01") ),
days = c(0, 120, 30, 62, 153, 28, 31))
date days
2018-02-01 0
2018-06-01 120
2018-07-01 30
2018-09-01 62
2019-02-01 153
2019-03-01 28
2019-04-01 31
預期輸出
year days
2018 334
2019 90
我怎樣才能在 R 中做到這一點? (理想情況下使用dplyr
,但如果這是唯一的方法,則 base-R 很好)
這是使用基數 R 的一種方法:
#Get day of the year
dat$day_in_year <- as.integer(format(dat$date, "%j"))
#Get year from date
dat$year <- as.integer(format(dat$date, "%Y"))
#Index where day in year is less than days
inds <- dat$day_in_year < dat$days
#Create a new dataframe with adjusted values
other_df <- data.frame(days = dat$days[inds] - dat$day_in_year[inds] + 1,
year = dat$year[inds] - 1)
#Update the original data
dat$days[inds] <- dat$day_in_year[inds] - 1
#Combine the two dataframe then aggregate
aggregate(days~year, rbind(dat[c('days', 'year')], other_df), sum)
# year days
#1 2018 334
#2 2019 90
一種可能的tidyverse
方式:
library(tidyverse)
dat %>% group_by(year = as.integer(format(date, '%Y'))) %>%
mutate(excess = days - (date - as.Date(paste0(year, '-01-01'))),
days = ifelse(excess > 0, days - excess, days)) %>%
summarise(days = sum(days), excess = as.integer(sum(excess[excess > 0]))) %>%
ungroup %>%
complete(year = seq(min(year), max(year)), fill = list(excess = 0)) %>%
mutate(days = days + lead(excess, default = 0), excess = NULL)
輸出:
# A tibble: 2 x 2
year days
<chr> <dbl>
1 2018 334
2 2019 90
基本上使用tapply
,從前四個字符子substr
獲取年份。
data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
# days
# 2018 212
# 2019 212
如果需要年份作為列,使用aggregate
可能更好。
with(dat, aggregate(list(days=days), list(date=substr(date, 1, 4)), sum))
# date days
# 1 2018 212
# 2 2019 212
為了獲得一年前的轉賬,我們可以編寫一個函數fun
進行減法運算,以獲得轉賬tr
。
fun <- function(d) d - as.Date(paste0(substr(d, 1, 4), "-01-01"))
tr <- with(dat, as.numeric(days - fun(date)))
tapply
解決方案:
res <- data.frame(days=with(dat, tapply(days, substr(date, 1, 4), sum)))
transform(res, days=days + tr[tr > 0] * c(1, -1))
# days
# 2018 334
# 2019 90
類似使用aggregate
:
res2 <- with(dat, aggregate(list(days=days),
list(date=substr(date, 1, 4)), sum))
transform(res2, days=days + tr[tr > 0] * c(1, -1))
# date days
# 1 2018 334
# 2 2019 90
數據:
dat <- structure(list(date = structure(c(17563, 17683, 17713, 17775,
17928, 17956, 17987), class = "Date"), days = c(0, 120, 30, 62,
153, 28, 31)), class = "data.frame", row.names = c(NA, -7L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.