简体   繁体   中英

Conditional operations with dplyr in R

I have the following data frame df

CustID  Mode_Payment Payment      Expiry       Amount
100      ECS         2015-01-01   2015-03-01    1000
200      Online      2015-01-01   2015-05-01    2000
100      ECS         2015-01-01   2015-10-01    3000
300      Cash        2015-01-01   2015-05-01    5000

I want to calculate a new field subscription period which is period<-as.numberic(expiry-payment).

But, when the Mode of payment is ECS then period should be calculated by the following formula :

group_by(CustID)
period<-max(expiry)-min(payment)
ugroup()

So for the above data set output should be

CustID  Mode_Payment       Payment      Expiry      Amount   Period
    100      ECS         2015-01-01   2015-03-01    1000      273
    200      Online      2015-01-01   2015-05-01    2000      120 
    100      ECS         2015-01-01   2015-10-01    3000      273
    300      Cash        2015-01-01   2015-05-01    5000      120

Unfortunately, I'm getting all kind of errors.

df<-df %>%
  group_by(custid) %>%
  if(mode_payement=='ECS') {mutate(period=(as.numeric(max(expiry)-min(payement))))
                      } else mutate(period=as.numeric((expiry-payment)))  %>%
  ungroup()

I modified your data a bit in case you have ECS and something else for a customer ID. I chose to use subsetting rather an using ifelse in my approach. You have one operation for data with ECS only and the other for the rest.

DATA & CODE

mydf <- read.table(text = "CustID  Mode_Payment Payment      Expiry       Amount
100      ECS         2015-01-01   2015-03-01    1000
200      Online      2015-01-01   2015-05-01    2000
100      ECS         2015-01-01   2015-10-01    3000
300      Cash        2015-01-01   2015-05-01    5000
100      Online         2015-01-01   2015-07-01    7000", header = T, stringsAsFactors = FALSE)

  CustID Mode_Payment    Payment     Expiry Amount
1    100          ECS 2015-01-01 2015-03-01   1000
2    200       Online 2015-01-01 2015-05-01   2000
3    100          ECS 2015-01-01 2015-10-01   3000
4    300         Cash 2015-01-01 2015-05-01   5000
5    100       Online 2015-01-01 2015-07-01   7000


library(dplyr)
library(data.table)

#Set Payment and Expiry as Date.
setDT(mydf)[, c("Payment", "Expiry") := lapply(.SD, as.IDate), .SDcols = 3:4]


x <- mydf[Mode_Payment == "ECS"][, period := max(Expiry) - min(Payment), by = CustID]

y <- mydf[Mode_Payment != "ECS"][, period := Expiry - Payment, by = CustID]

rbindlist(list(x, y))

#   CustID Mode_Payment    Payment     Expiry Amount   period
#1:    100          ECS 2015-01-01 2015-03-01   1000 273 days
#2:    100          ECS 2015-01-01 2015-10-01   3000 273 days
#3:    200       Online 2015-01-01 2015-05-01   2000 120 days
#4:    300         Cash 2015-01-01 2015-05-01   5000 120 days
#5:    100       Online 2015-01-01 2015-07-01   7000 181 days

### dplyr way

filter(mydf, Mode_Payment == "ECS") %>%
group_by(CustID) %>%
mutate(period = max(Expiry) - min(Payment)) -> x

filter(mydf, Mode_Payment != "ECS") %>%
mutate(period = Expiry - Payment) -> y

bind_rows(x, y)

Or dplyr with ifelse:

data %>%
  group_by(CustID) %>%
  mutate_each(funs(as.Date), Expiry, Payment) %>%
  mutate(period = 
           (Mode_Payment == "ECS") %>%
           ifelse(
             max(Expiry) - min(Payment),
             Expiry - Payment) )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM