简体   繁体   中英

Create New Column Based on Previous Row and Multiple Conditions in R

I have the following sample data frame:

x
date          product   release    
2012-01-01    A         0                   
2012-01-02    A         0                   
2012-01-03    A         0                   
2012-01-04    A         1 
2012-01-05    A         0     
2012-01-06    A         0   
2012-01-07    A         0   
2012-01-08    A         0   
2012-01-09    A         0   
2012-01-10    A         0   
2012-01-11    A         0   
2012-01-12    A         0 
2012-01-01    Z         0                   
2012-01-02    Z         1                   
2012-01-03    Z         0                   
2012-01-04    Z         0   
2012-01-05    Z         0     
2012-01-06    Z         0   
2012-01-07    Z         0 

I want to iterate through each row and generate a dayssince column based on how many days it's been since the release.

Few things to keep in mind:
- new product released = 1 no product released = 0
- the output needs to be unique to the date and the product

The desired output would be:

   x
    date      product   release    dayssince  
    2012-01-01    A         0          0         
    2012-01-02    A         0          0        
    2012-01-03    A         0          0        
    2012-01-04    A         1          1
    2012-01-05    A         0          2
    2012-01-06    A         0          3
    2012-01-07    A         0          4
    2012-01-08    A         0          5
    2012-01-09    A         0          6
    2012-01-10    A         0          7
    2012-01-11    A         0          8
    2012-01-12    A         0          9
    2012-01-01    Z         0          0        
    2012-01-02    Z         1          1        
    2012-01-03    Z         0          2        
    2012-01-04    Z         0          3
    2012-01-05    Z         0          4
    2012-01-06    Z         0          5
    2012-01-07    Z         0          6

I've tried everything I could think of from ifelse statements and for loops to ddply.

The simplest way I've been able to approach the problem is to do the following conceptually:

x$dayssince <- ifelse(x$release > 0, 1, 0)

- Then check each row in dayssince.
- If dayssince == 1, then 1
- If dayssince < 1, then check row above.
- If row above is > 0 , then use value of row above + 1
- All this unique to the product.

Thank you in advance!

UPDATE/CLARIFICATION:

For the same products that release multiple times per year, I'm looking to get the number of days since the last release .

For example:

    x
    date      product   release    dayssince  
    2012-01-01    A         0          0         
    2012-01-02    A         0          0        
    2012-01-03    A         0          0        
    2012-01-04    A         1          1
    2012-01-05    A         0          2
    2012-01-06    A         0          3
    2012-01-07    A         0          4
    2012-01-08    A         0          5
    2012-01-09    A         0          6
    2012-01-10    A         1          1
    2012-01-11    A         0          2
    2012-01-12    A         0          3
    2012-01-13    A         0          4
    2012-01-14    A         0          5

etc... Thanks for the flag @DMC

You can try using ave from base R

 x$dayssince <-  with(x, ave(release, cumsum(release), product, 
                          FUN=function(y) cumsum(cumsum(y))))

Or using data.table

library(data.table)
setDT(x)[,dayssince:=cumsum(cumsum(release)) ,
                   .(product,cumsum(release))][]
 #  1: 2012-01-01       A       0         0
 #  2: 2012-01-02       A       0         0
 #  3: 2012-01-03       A       0         0
 #  4: 2012-01-04       A       1         1
 #  5: 2012-01-05       A       0         2
 #  6: 2012-01-06       A       0         3
 #  7: 2012-01-07       A       0         4
 #  8: 2012-01-08       A       0         5
 #  9: 2012-01-09       A       0         6
 # 10: 2012-01-10       A       1         1
 # 11: 2012-01-11       A       0         2
 # 12: 2012-01-12       A       0         3
 # 13: 2012-01-01       Z       0         0
 # 14: 2012-01-02       Z       1         1
 # 15: 2012-01-03       Z       0         2
 # 16: 2012-01-04       Z       0         3
 # 17: 2012-01-05       Z       0         4
 # 18: 2012-01-06       Z       0         5
 # 19: 2012-01-07       Z       0         6

The solution uses dplyr and creates an intermediate variable release_num :

library(dplyr)

x %>%
  group_by(product) %>%
  mutate(release_num = cumsum(release)) %>%
  group_by(product, release_num) %>%
  mutate(dayssince = cumsum(cumsum(release)))

One comment that I have is that you ask for a solution that 'iterates row-by-row.' This isn't an R way of doing things. R works on vectors--typically column vectors. Therefore, any solution will require a bit of a workaround. You could switch to something like SAS which does explicitly work row-wise.

My solution uses the plyr library, although it's not vectorized. It may therefore be slower than some alternatives.

# given vector of release dates and output vector, produce "dayssince"
ds <- function(rel.dts, x) {
  n <- length(rel.dts)
  x[1:rel.dts[1]] <- 0
  for (i in 2:n) {
    x[(rel.dts[i-1]):(rel.dts[i]-1)] <- 0:(rel.dts[i]-rel.dts[i-1]-1)
  }
  x[rel.dts[n]:length(x)] <- 0:(length(x)-rel.dts[n])
  return(x)
}

# use ds() on a given product
ds.prod <- function(dat) {
  dat <- dat[order(dat$date, decreasing=FALSE),]
  rel.dts <- which(dat$release ==1)
  ds <- get("ds")
  dat$daysince <- ds(rel.dts, x=vector("integer", length= nrow(dat)))
  return(dat)
}

# split by product and run
require(plyr)
dat <- ddply(dat, .var="product", .fun= ds.prod)

If your data is coming from a database, it may be easier to create a view with a computed column used to calculate the days since release.

I am currently too tired to post any SQL code, but if it is an approach you would consider, I can provide some example code tomorrow.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM