I have a data
that has payment schedule for customers that is ordered by transaction date.I want to calculate the average number of consecutive failed payments and average number of consecutive success payments.The table looks like below:
customer_id |transaction_id.|failed_or_success | transaction_date
1 |1 |success |2021-01-01
1 |2 |success |2021-01-15
1 |3 |failed |2021-01-30
1 |4 |success |2021-02-15
For example, the average number of consecutive success payment would be (2+1)/2=1.5
, the first 2
comes from transaction_id 1 & 2.the second 1
comes from transaction_id 4. And the average number of consecutive failed payment would just be 1 in this example. Eventually the table would look like this:
cus_id |tran_id.|f_or_s |tran_date |avg_consec_fail|avg_consec_success
1 |1 |success|2021-01-01 |1 |1.5
1 |2 |success|2021-01-15 |1 |1.5
1 |3 |failed |2021-01-30 |1 |1.5
1 |4 |success|2021-02-15 |1 |1.5
How do I make this happen with R/dplyr
?
You may try using rle
df <- read.table(text = "customer_id transaction_id. failed_or_success transaction_date
1 1 success 2021-01-01
1 2 success 2021-01-15
1 3 failed 2021-01-30
1 4 success 2021-02-15", header = TRUE)
df %>%
mutate(avg_consec_success = mean(rle(failed_or_success)$length[rle(failed_or_success)$values != "failed"]))
customer_id transaction_id. failed_or_success transaction_date avg_consec_success
1 1 1 success 2021-01-01 1.5
2 1 2 success 2021-01-15 1.5
3 1 3 failed 2021-01-30 1.5
4 1 4 success 2021-02-15 1.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.