简体   繁体   中英

R unlist and multiply (date intervals)

trying to calculate number of cases between 2 dates, there is table that contains a number and time interval, i want to create output table with dates and sum of cases. The easy (and resolved) problem is:

 df <- data.frame(person = c("A", "B", "C"), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )
 df

 f1 = function() {  #keeping dates
   as.data.frame(table(unlist(apply(df[-1], 1, 
                                    function(x) as.character(seq(as.Date(x[1], "%Y-%m-%d"), 
                                                                 as.Date(x[2], "%Y-%m-%d"), "1 day"))))))}
 f1()

and it would return

        Var1 Freq
1 2014-01-01    1
2 2014-01-02    1
3 2014-01-03    1
4 2014-01-04    2
5 2014-01-05    1
6 2014-01-06    1

what i would need is to summarize the first column instead, with the input data like this

 df <- data.frame(cases = c(5, 2, 2), start = c("2014-01-01", "2014-01-03", "2014-01-04"), stop = c("2014-01-02", "2014-01-06", "2014-01-04") )

it should return

        Var1 cases
1 2014-01-01    5
2 2014-01-02    5
3 2014-01-03    2
4 2014-01-04    4
5 2014-01-05    2
6 2014-01-06    2

maybe it would not be even case for unlisting, what can i potentially use to calculate number of cases per day? and is there any way to show 0 values if the date is valid between start and end but does not have any occurrences in the data

EDIT

Aichao's answer is what i needed - the only missing bit is to get 0 sums for example in

df <- data.frame(cases = c(5, 2, 2), 
start = c("2014-01-01", "2014-01-04", "2014-01-04"), 
stop = c("2014-01-02", "2014-01-06", "2014-01-04") )

to get

        Var1 x
1 2014-01-01 5
2 2014-01-02 5
3 2014-01-03 0
4 2014-01-04 4
5 2014-01-05 2
6 2014-01-06 2

Here is a solution that is in-line with what you are doing with f1 :

f2 <- function(df) {
  df2 <- do.call(rbind, lapply(1:nrow(df), function(i) {
    Var1 <- as.character(seq(as.Date(df$start[i],format="%Y-%m-%d"),
                             as.Date(df$stop[i],format="%Y-%m-%d"),"day"))
    cases <- rep(df$cases[i],length(Var1))
    data.frame(Var1,cases)
  }))
  aggregate(df2[,-1], by=list(Var1=df2[,1]), FUN=sum)
}

In f2 :

  1. Build a data frame df2 from the sequence of dates from df$start to df$stop for each row in df . Here, lapply is used to loop over each row of df and the cases for each row is repeated to match the length of the resulting sequence of dates. Then use rbind to combine each of these data frames by rows.
  2. Then aggregate (from the stats package) this df2 by dates (ie, Var1 ) and sum up the cases .

With your data:

f2(df)
##        Var1 x
##1 2014-01-01 5
##2 2014-01-02 5
##3 2014-01-03 2
##4 2014-01-04 4
##5 2014-01-05 2
##6 2014-01-06 2

One way to fill in the missing dates with 0 as cases is to take the aggregated results from the above solution and creating a new sequence of dates spanning the range of dates. This will create the Var1 column for the new output. Then, it is a matter of copying the cases from the old result to the new output that matches on date:

f2 <- function(df) {
  df2 <- do.call(rbind, lapply(1:nrow(df), function(i) {
    ## note that we do not convert to characters here because we want to use these later to form the sequence
    Var1 <- seq(as.Date(df$start[i],format="%Y-%m-%d"),
                as.Date(df$stop[i],format="%Y-%m-%d"),"day")
    cases <- rep(df$cases[i],length(Var1))
    data.frame(Var1,cases)
  }))
  df2 <- aggregate(df2[,-1], by=list(Var1=df2[,1]), FUN=sum)
  ## sort previous result by date
  df2 <- df2[order(df2[,1]),]
  ## create new sequence spanning range
  Var1 <- as.character(seq(df2[1,1],df2[nrow(df2),1],"day"))
  ## create cases of zeros matching Var1 in length
  cases <- rep(0,length(Var1))
  ## copy over cases from previous result that matches date
  cases[na.omit(match(as.character(df2[,1]),Var1))] <- df2[,2]
  ## output as data frame
  data.frame(Var1,cases)
}

On your updated data:

f2(df)
##        Var1 cases
##1 2014-01-01     5
##2 2014-01-02     5
##3 2014-01-03     0
##4 2014-01-04     4
##5 2014-01-05     2
##6 2014-01-06     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM