[英]Calculate the number of occurrences of a specific event in the past AND future with groupings
[英]R: calculate the number of occurrences of a specific event in a specified time future
我的簡化數據如下所示:
set.seed(1453); x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20',
'2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31')
df = data.frame(x, date = as.Date(date))
df
x date
1 2016-01-01
0 2016-01-05
1 2016-01-07
0 2016-01-12
0 2016-01-16
1 2016-01-20
1 2016-01-20
0 2016-01-25
0 2016-01-26
1 2016-01-31
我想計算在指定時間段內x == 1
的出現次數,例如距離當前日期的14天和30天(但不包括當前條目,如果它是x == 1
期望的輸出看起來像這樣:
solution
x date x_plus14 x_plus30
1 2016-01-01 1 3
0 2016-01-05 1 4
1 2016-01-07 2 3
0 2016-01-12 2 3
0 2016-01-16 2 3
1 2016-01-20 2 2
1 2016-01-20 1 1
0 2016-01-25 1 1
0 2016-01-26 1 1
1 2016-01-31 0 0
理想情況下,我希望這是dplyr
,但這不是必須的。 任何想法如何實現這一目標? 非常感謝你的幫助!
添加基於findInterval
另一種方法:
cs = cumsum(df$x) # cumulative number of occurences
data.frame(df,
plus14 = cs[findInterval(df$date + 14, df$date, left.open = TRUE)] - cs,
plus30 = cs[findInterval(df$date + 30, df$date, left.open = TRUE)] - cs)
# x date plus14 plus30
#1 1 2016-01-01 1 3
#2 0 2016-01-05 1 4
#3 1 2016-01-07 2 3
#4 0 2016-01-12 2 3
#5 0 2016-01-16 2 3
#6 1 2016-01-20 2 2
#7 1 2016-01-20 1 1
#8 0 2016-01-25 1 1
#9 0 2016-01-26 1 1
#10 1 2016-01-31 0 0
早些時候我沒有包括現在的日期,所以數字不匹配。
library(data.table)
setDT(df)[, `:=`(x14 = sum(df$x[between(df$date, date, date + 14, incbounds = FALSE)]),
x30 = sum(df$x[between(df$date, date, date + 30, incbounds = FALSE)])),
by = date]
# x date x14 x30
# 1: 1 2016-01-01 1 3
# 2: 0 2016-01-05 1 4
# 3: 1 2016-01-07 2 3
# 4: 0 2016-01-12 2 3
# 5: 0 2016-01-16 2 3
# 6: 1 2016-01-20 1 1
# 7: 1 2016-01-20 1 1
# 8: 0 2016-01-25 1 1
# 9: 0 2016-01-26 1 1
# 10: 1 2016-01-31 0 0
或者適用於任何所需范圍的通用解決方案
vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) :=
lapply(vec, function(i) sum(df$x[between(df$date,
date,
date + i,
incbounds = FALSE)])),
by = date]
這是我的一些dplyr
+ purrr
幫助。 由於輔助函數x_next()
的<=
和>=
,如果你正確調整它,我得到的計數略有不同我認為你應該能得到你想要的東西。 心連心。
library("tidyverse")
library("lubridate")
set.seed(1453)
x = sample(0:1, 10, TRUE)
dates = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20',
'2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31')
df = data_frame(x = x, dates = lubridate::as_date(dates))
# helper function to calculate the sum of xs in the next days_in_future
x_next <- function(d, days_in_future) {
df %>%
# subset on days of interest
filter(dates > d & dates <= d + days(days_in_future)) %>%
# sum up xs
summarise(sum = sum(x)) %>%
# have to unlist them so that the (following) call to mutate works
unlist(use.names=F)
}
# mutate your df
df %>%
mutate(xplus14 = map(dates, x_next, 14),
xplus30 = map(dates, x_next, 30))
簡潔的dplyr
和purrr
解決方案:
library(tidyverse)
sample %>%
mutate(x_plus14 = map(date, ~sum(x == 1 & between(date, . + 1, . + 14))),
x_plus30 = map(date, ~sum(x == 1 & between(date, . + 1, . + 30))))
x date x_plus14 x_plus30 1 1 2016-01-01 1 4 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0
正如其他已經提到的那樣,奇怪的是你不計算日期,你應該避免按函數名稱(樣本)命名對象。 但是,下面的代碼會重現您想要的輸出:
set.seed(1453);
x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20',
'2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31')
sample = data.frame(x = x, date = as.Date(sample$date))
getOccurences <- function(one_row, sample_data, date_range){
one_date <- as.Date(one_row[2])
sum(sample$x[sample_data$date > one_date &
sample_data$date < one_date + date_range])
}
sample$x_plus14 <- apply(sample,1,getOccurences, sample, 14)
sample$x_plus30 <- apply(sample,1,getOccurences, sample, 30)
sample
x date x_plus14 x_plus30
1 1 2016-01-01 1 3
2 0 2016-01-05 1 4
3 1 2016-01-07 2 3
4 0 2016-01-12 2 3
5 0 2016-01-16 2 3
6 1 2016-01-20 1 1
7 1 2016-01-20 1 1
8 0 2016-01-25 1 1
9 0 2016-01-26 1 1
10 1 2016-01-31 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.