I have the following sample data frame:
x
date product release
2012-01-01 A 0
2012-01-02 A 0
2012-01-03 A 0
2012-01-04 A 1
2012-01-05 A 0
2012-01-06 A 0
2012-01-07 A 0
2012-01-08 A 0
2012-01-09 A 0
2012-01-10 A 0
2012-01-11 A 0
2012-01-12 A 0
2012-01-01 Z 0
2012-01-02 Z 1
2012-01-03 Z 0
2012-01-04 Z 0
2012-01-05 Z 0
2012-01-06 Z 0
2012-01-07 Z 0
I want to iterate through each row and generate a dayssince column based on how many days it's been since the release.
Few things to keep in mind:
- new product released = 1 no product released = 0
- the output needs to be unique to the date and the product
The desired output would be:
x
date product release dayssince
2012-01-01 A 0 0
2012-01-02 A 0 0
2012-01-03 A 0 0
2012-01-04 A 1 1
2012-01-05 A 0 2
2012-01-06 A 0 3
2012-01-07 A 0 4
2012-01-08 A 0 5
2012-01-09 A 0 6
2012-01-10 A 0 7
2012-01-11 A 0 8
2012-01-12 A 0 9
2012-01-01 Z 0 0
2012-01-02 Z 1 1
2012-01-03 Z 0 2
2012-01-04 Z 0 3
2012-01-05 Z 0 4
2012-01-06 Z 0 5
2012-01-07 Z 0 6
I've tried everything I could think of from ifelse statements and for loops to ddply.
The simplest way I've been able to approach the problem is to do the following conceptually:
x$dayssince <- ifelse(x$release > 0, 1, 0)
- Then check each row in dayssince.
- If dayssince == 1, then 1
- If dayssince < 1, then check row above.
- If row above is > 0 , then use value of row above + 1
- All this unique to the product.
Thank you in advance!
For the same products that release multiple times per year, I'm looking to get the number of days since the last release .
For example:
x
date product release dayssince
2012-01-01 A 0 0
2012-01-02 A 0 0
2012-01-03 A 0 0
2012-01-04 A 1 1
2012-01-05 A 0 2
2012-01-06 A 0 3
2012-01-07 A 0 4
2012-01-08 A 0 5
2012-01-09 A 0 6
2012-01-10 A 1 1
2012-01-11 A 0 2
2012-01-12 A 0 3
2012-01-13 A 0 4
2012-01-14 A 0 5
etc... Thanks for the flag @DMC
You can try using ave
from base R
x$dayssince <- with(x, ave(release, cumsum(release), product,
FUN=function(y) cumsum(cumsum(y))))
Or using data.table
library(data.table)
setDT(x)[,dayssince:=cumsum(cumsum(release)) ,
.(product,cumsum(release))][]
# 1: 2012-01-01 A 0 0
# 2: 2012-01-02 A 0 0
# 3: 2012-01-03 A 0 0
# 4: 2012-01-04 A 1 1
# 5: 2012-01-05 A 0 2
# 6: 2012-01-06 A 0 3
# 7: 2012-01-07 A 0 4
# 8: 2012-01-08 A 0 5
# 9: 2012-01-09 A 0 6
# 10: 2012-01-10 A 1 1
# 11: 2012-01-11 A 0 2
# 12: 2012-01-12 A 0 3
# 13: 2012-01-01 Z 0 0
# 14: 2012-01-02 Z 1 1
# 15: 2012-01-03 Z 0 2
# 16: 2012-01-04 Z 0 3
# 17: 2012-01-05 Z 0 4
# 18: 2012-01-06 Z 0 5
# 19: 2012-01-07 Z 0 6
The solution uses dplyr
and creates an intermediate variable release_num
:
library(dplyr)
x %>%
group_by(product) %>%
mutate(release_num = cumsum(release)) %>%
group_by(product, release_num) %>%
mutate(dayssince = cumsum(cumsum(release)))
One comment that I have is that you ask for a solution that 'iterates row-by-row.' This isn't an R way of doing things. R works on vectors--typically column vectors. Therefore, any solution will require a bit of a workaround. You could switch to something like SAS which does explicitly work row-wise.
My solution uses the plyr
library, although it's not vectorized. It may therefore be slower than some alternatives.
# given vector of release dates and output vector, produce "dayssince"
ds <- function(rel.dts, x) {
n <- length(rel.dts)
x[1:rel.dts[1]] <- 0
for (i in 2:n) {
x[(rel.dts[i-1]):(rel.dts[i]-1)] <- 0:(rel.dts[i]-rel.dts[i-1]-1)
}
x[rel.dts[n]:length(x)] <- 0:(length(x)-rel.dts[n])
return(x)
}
# use ds() on a given product
ds.prod <- function(dat) {
dat <- dat[order(dat$date, decreasing=FALSE),]
rel.dts <- which(dat$release ==1)
ds <- get("ds")
dat$daysince <- ds(rel.dts, x=vector("integer", length= nrow(dat)))
return(dat)
}
# split by product and run
require(plyr)
dat <- ddply(dat, .var="product", .fun= ds.prod)
If your data is coming from a database, it may be easier to create a view with a computed column used to calculate the days since release.
I am currently too tired to post any SQL code, but if it is an approach you would consider, I can provide some example code tomorrow.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.