简体   繁体   中英

R: converting start/end dates into data series

I have the following data frame representing user subscriptions:

User  StartDate   EndDate
1     2015-09-03  2015-10-17
2     2015-10-27  2015-12-25
...

How can I transform it into a time series that gives me the count of active monthly subscriptions over time (assuming it is active in the month if at least for one day in that month). Something like this (based on the example above, assuming only 2 records):

Month    Count
2015-08  0
2015-09  1
2015-10  2
2015-11  1
2015-12  1
2016-01  0

Rem: I took some arbitrary start and end dates for the time series, to make the example clear.

Prepare the data and make sure that the date columns are actually stored as dates:

data <- read.table(text = "User  StartDate   EndDate
1     2015-09-03  2015-10-17
2     2015-10-27  2015-12-25", header = TRUE)
data$StartDate <- as.Date(StartDate)
data$EndDate <- as.Date(EndDate))

This function returns a vector with all month that are within a subscription:

library(lubridate)
subscr_month <- function(start, end) {

  start <- floor_date(start, "month")
  seq <- seq(start, end, by = "1 month")
  months <- format(seq, format = "%Y-%m")
  return(months)

}

It uses the function floor_date() from the lubridate package. It is necessary to round of the start date, because otherwise the last month might be missing. For example, for user 2, if you add two month to the start date, you end up on 2015-12-27 , which is after the end date, such that no date from December will be included in seq . The last line converts the Dates to character that only include year and month.

Now, you can apply this function to each start and end date from your data using mapply() . Afterwards, table() creates a table of counts of all dates in the resulting list:

all_month <- mapply(subscr_month, data$StartDate, data$EndDate, SIMPLIFY = FALSE)
table(unlist(all_month))
## 2015-09 2015-10 2015-11 2015-12 
##       1       2       1       1 

You can also convert the table to a data frame:

as.data.frame(table(unlist(all_month)))
##      Var1 Freq
## 1 2015-09    1
## 2 2015-10    2
## 3 2015-11    1
## 4 2015-12    1

Your example output also includes the counts for months that do not appear in the data set. If you want to have this, you can convert the vector of months to a factor and set the levels to all the months you want to include:

month_list <- format(seq(as.Date("2015-08-01"), as.Date("2016-01-01"), by = "1 month"), format = "%Y-%m")
all_month_factor <- factor(unlist(all_month), levels = month_list)
table(all_month_factor)
## all_month_factor
## 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 
##       0       1       2       1       1       0 

read the data frame mentioned.

df = structure(list(StartDate = structure(c(16681, 16735), class = "Date"), 
    EndDate = structure(c(16735, 16794), class = "Date")), class = "data.frame", .Names = c("StartDate", 
"EndDate"), row.names = c(NA, -2L))

Could make good use of do in dplyr package and seq

df %>%
      rowwise() %>% do({
        w <- seq(.$StartDate,.$EndDate,by = "15 days") #for month difference less than 1 complete month
        m <- format(w,"%Y-%m") %>% unique 
        data.frame(Month = m)
      }) %>%
      group_by(Month) %>%
      summarise(Count = length(Month))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM