簡體   English   中英

R:計算未來指定時間內特定事件的發生次數

[英]R: calculate the number of occurrences of a specific event in a specified time future

我的簡化數據如下所示:

set.seed(1453); x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
             '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')


df = data.frame(x, date = as.Date(date))


df 
x       date
1 2016-01-01
0 2016-01-05
1 2016-01-07
0 2016-01-12
0 2016-01-16
1 2016-01-20
1 2016-01-20
0 2016-01-25
0 2016-01-26
1 2016-01-31

我想計算在指定時間段內x == 1的出現次數,例如距離當前日期的14天和30天(但不包括當前條目,如果它是x == 1期望的輸出看起來像這樣:

solution
x       date x_plus14 x_plus30
1 2016-01-01        1        3
0 2016-01-05        1        4
1 2016-01-07        2        3
0 2016-01-12        2        3
0 2016-01-16        2        3
1 2016-01-20        2        2
1 2016-01-20        1        1
0 2016-01-25        1        1
0 2016-01-26        1        1
1 2016-01-31        0        0

理想情況下,我希望這是dplyr ,但這不是必須的。 任何想法如何實現這一目標? 非常感謝你的幫助!

添加基於findInterval另一種方法:

cs = cumsum(df$x) # cumulative number of occurences
data.frame(df, 
           plus14 = cs[findInterval(df$date + 14, df$date, left.open = TRUE)] - cs, 
           plus30 = cs[findInterval(df$date + 30, df$date, left.open = TRUE)] - cs)
#   x       date plus14 plus30
#1  1 2016-01-01      1      3
#2  0 2016-01-05      1      4
#3  1 2016-01-07      2      3
#4  0 2016-01-12      2      3
#5  0 2016-01-16      2      3
#6  1 2016-01-20      2      2
#7  1 2016-01-20      1      1
#8  0 2016-01-25      1      1
#9  0 2016-01-26      1      1
#10 1 2016-01-31      0      0

早些時候我沒有包括現在的日期,所以數字不匹配。

library(data.table)
setDT(df)[, `:=`(x14 = sum(df$x[between(df$date, date, date + 14, incbounds = FALSE)]), 
                 x30 = sum(df$x[between(df$date, date, date + 30, incbounds = FALSE)])),
              by = date]

#     x       date x14 x30
#  1: 1 2016-01-01   1   3
#  2: 0 2016-01-05   1   4
#  3: 1 2016-01-07   2   3
#  4: 0 2016-01-12   2   3
#  5: 0 2016-01-16   2   3
#  6: 1 2016-01-20   1   1
#  7: 1 2016-01-20   1   1
#  8: 0 2016-01-25   1   1
#  9: 0 2016-01-26   1   1
# 10: 1 2016-01-31   0   0

或者適用於任何所需范圍的通用解決方案

vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) := 
            lapply(vec, function(i) sum(df$x[between(df$date, 
                                                     date, 
                                                     date + i, 
                                                     incbounds = FALSE)])),
            by = date]

這是我的一些dplyr + purrr幫助。 由於輔助函數x_next()<=>= ,如果你正確調整它,我得到的計數略有不同我認為你應該能得到你想要的東西。 心連心。

library("tidyverse")
library("lubridate")
set.seed(1453)

x = sample(0:1, 10, TRUE)
dates = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
         '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')

df = data_frame(x = x, dates = lubridate::as_date(dates))

# helper function to calculate the sum of xs in the next days_in_future
x_next <- function(d, days_in_future) {

  df %>% 
    # subset on days of interest
    filter(dates > d & dates <= d + days(days_in_future)) %>% 
    # sum up xs
    summarise(sum = sum(x)) %>% 
    # have to unlist them so that the (following) call to mutate works
    unlist(use.names=F)
  }

# mutate your df
df %>% 
  mutate(xplus14 = map(dates, x_next, 14),
         xplus30 = map(dates, x_next, 30))

簡潔的dplyrpurrr解決方案:

library(tidyverse)

sample %>% 
  mutate(x_plus14 = map(date, ~sum(x == 1 & between(date, . + 1, . + 14))),
         x_plus30 = map(date, ~sum(x == 1 & between(date, . + 1, . + 30))))
  x date x_plus14 x_plus30 1 1 2016-01-01 1 4 2 0 2016-01-05 1 4 3 1 2016-01-07 2 3 4 0 2016-01-12 2 3 5 0 2016-01-16 2 3 6 1 2016-01-20 1 1 7 1 2016-01-20 1 1 8 0 2016-01-25 1 1 9 0 2016-01-26 1 1 10 1 2016-01-31 0 0 

正如其他已經提到的那樣,奇怪的是你不計算日期,你應該避免按函數名稱(樣本)命名對象。 但是,下面的代碼會重現您想要的輸出:

set.seed(1453); 
x = sample(0:1, 10, TRUE)
date = c('2016-01-01', '2016-01-05', '2016-01-07',  '2016-01-12',  '2016-01-16',  '2016-01-20',
             '2016-01-20',  '2016-01-25',  '2016-01-26',  '2016-01-31')


sample = data.frame(x = x, date = as.Date(sample$date))

getOccurences <- function(one_row, sample_data, date_range){
  one_date <- as.Date(one_row[2])
  sum(sample$x[sample_data$date > one_date & 
               sample_data$date < one_date + date_range])
}

sample$x_plus14 <- apply(sample,1,getOccurences, sample, 14)
sample$x_plus30 <- apply(sample,1,getOccurences, sample, 30)

sample

   x       date x_plus14 x_plus30
1  1 2016-01-01        1        3
2  0 2016-01-05        1        4
3  1 2016-01-07        2        3
4  0 2016-01-12        2        3
5  0 2016-01-16        2        3
6  1 2016-01-20        1        1
7  1 2016-01-20        1        1
8  0 2016-01-25        1        1
9  0 2016-01-26        1        1
10 1 2016-01-31        0        0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM