[英]How to get the average of consecutive occurrences when meet certain condition in R
I have a data
that has payment schedule for customers that is ordered by transaction date.I want to calculate the average number of consecutive failed payments and average number of consecutive success payments.The table looks like below:我有一个
data
,其中包含按交易日期订购的客户付款时间表。我想计算连续失败付款的平均次数和连续成功付款的平均次数。表格如下所示:
customer_id |transaction_id.|failed_or_success | transaction_date
1 |1 |success |2021-01-01
1 |2 |success |2021-01-15
1 |3 |failed |2021-01-30
1 |4 |success |2021-02-15
For example, the average number of consecutive success payment would be (2+1)/2=1.5
, the first 2
comes from transaction_id 1 & 2.the second 1
comes from transaction_id 4. And the average number of consecutive failed payment would just be 1 in this example.例如,平均连续支付成功次数为
(2+1)/2=1.5
,前2
来自 transaction_id 1 & 2,第二个1
来自 transaction_id 4。而连续支付失败的平均次数为在本例中为 1。 Eventually the table would look like this:最终表格将如下所示:
cus_id |tran_id.|f_or_s |tran_date |avg_consec_fail|avg_consec_success
1 |1 |success|2021-01-01 |1 |1.5
1 |2 |success|2021-01-15 |1 |1.5
1 |3 |failed |2021-01-30 |1 |1.5
1 |4 |success|2021-02-15 |1 |1.5
How do I make this happen with R/dplyr
?我如何使用
R/dplyr
实现这一点?
You may try using rle
您可以尝试使用
rle
df <- read.table(text = "customer_id transaction_id. failed_or_success transaction_date
1 1 success 2021-01-01
1 2 success 2021-01-15
1 3 failed 2021-01-30
1 4 success 2021-02-15", header = TRUE)
df %>%
mutate(avg_consec_success = mean(rle(failed_or_success)$length[rle(failed_or_success)$values != "failed"]))
customer_id transaction_id. failed_or_success transaction_date avg_consec_success
1 1 1 success 2021-01-01 1.5
2 1 2 success 2021-01-15 1.5
3 1 3 failed 2021-01-30 1.5
4 1 4 success 2021-02-15 1.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.