简体   繁体   English

R 查找平均订单间隔(天数)

[英]R finding the average order interval (number of days)

My goal is to obtain the average number of days it takes for a given product to be purchased.我的目标是获得购买给定产品所需的平均天数。 If Product_A is purchased three times over a given period ('2012-12-01','2012-12-05,'2012-12-10') then our average order interval will be the average of 4 & 5 - 4.5 days.如果 Product_A 在给定的时间段内被购买了 3 次 ('2012-12-01','2012-12-05,'2012-12-10') 那么我们的平均订单间隔将是 4 & 5 - 4.5 天的平均值.

I wrote a For Loop to calculate the interval between two points (I can use the aggregate function to calculate my mean or median by product) but I keep getting a length error.我写了一个 For 循环来计算两点之间的间隔(我可以使用聚合函数来计算乘积的均值或中值),但我一直收到长度错误。 This is supposed to be a scale-able solution这应该是一个可扩展的解决方案

Here is a sample dataframe:这是一个示例数据框:

product_info <- data.frame(productId = c("A", "A", "A", "B","B","B"), 
                           order_date = c("2014-05-01", "2014-05-05", "2014-05-10", "2014-06-01","2014-06-07", "2014-06-18"), stringsAsFactors=FALSE)

Here is my code:这是我的代码:

 for (i in 2:length(unique(product_info$productId))){
  if(product_info$productId[i]==product_info$productId[i-1]){
    product_info$interval[i] <- as.integer(difftime(product_info$order_date[i],product_info$order_date[i-1]))
  }
}

My desired output should be:我想要的输出应该是:

product_info <- data.frame(productId = c("A", "A", "A", "B","B","B"), 
                           order_date = c("2014-05-01", "2014-05-05", "2014-05-10", "2014-06-01","2014-06-07", "2014-06-18"), 
                           interval= c("0", "4", "5", "0","6","11"), stringsAsFactors=FALSE)

Any help would be very much appreciated.任何帮助将不胜感激。

Thank you谢谢

You can try你可以试试

  product_info$order_date <- as.Date(product_info$order_date)

  product_info$interval <- with(product_info, ave(as.numeric(order_date), 
                  productId, FUN=function(x) c(0, diff(x))))
  product_info
  productId order_date interval
1         A 2014-05-01        0
2         A 2014-05-05        4
3         A 2014-05-10        5
4         B 2014-06-01        0
5         B 2014-06-07        6
6         B 2014-06-18       11

Or using data.table或者使用data.table

 library(data.table)#v1.9.5+
 setDT(product_info)[,interval := c(0, diff(as.Date(order_date))) , productId]

If the 'order_date' is not ordered, we have to 'order` it before doing the 'diff'如果“order_date”没有被排序,我们必须在做“diff”之前“order”它

 setDT(product_info)[, order_date:= as.Date(order_date)
           ][order(order_date), interval :=as.numeric(order_date -
           shift(order_date, fill=order_date[1L])) , by = productId]
 #    productId order_date interval
 #1:         A 2014-05-01        0
 #2:         A 2014-05-05        4
 #3:         A 2014-05-10        5
 #4:         B 2014-06-01        0
 #5:         B 2014-06-07        6
 #6:         B 2014-06-18       11

Convert to date format -转换为日期格式 -

product_info$order_date <- as.Date(product_info$order_date)

Using dplyr :使用dplyr

library(dplyr)
product_info %>% group_by(productId) %>%
                 mutate(interval=c(0,diff(order_date))

Here is a dplyr solution.这是一个dplyr解决方案。 You first want to convert to the date format, then ordering by the date, grouping by product and finally adding the column which is the difference between the last two days within this product.您首先要转换为日期格式,然后按日期排序,按产品分组,最后添加该产品中最近两天之间的差异的列。 Note that the 0 days have been replaced with NA which IMHO is more applicable than 0 .请注意, 0 天已被替换为NA ,恕我直言,它比0更适用。

library(dplyr)
product_info <- product_info %>%
    mutate(order_date=as.Date(order_date)) %>%
    arrange(order_date) %>%
    group_by(productId) %>%
    mutate(interval=order_date-lag(order_date))

product_info
  productId order_date interval
1         A 2014-05-01  NA days
2         A 2014-05-05   4 days
3         A 2014-05-10   5 days
4         B 2014-06-01  NA days
5         B 2014-06-07   6 days
6         B 2014-06-18  11 days

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM