简体   繁体   English

在R中使用dplyr进行条件运算

[英]Conditional operations with dplyr in R

I have the following data frame df 我有以下数据框df

CustID  Mode_Payment Payment      Expiry       Amount
100      ECS         2015-01-01   2015-03-01    1000
200      Online      2015-01-01   2015-05-01    2000
100      ECS         2015-01-01   2015-10-01    3000
300      Cash        2015-01-01   2015-05-01    5000

I want to calculate a new field subscription period which is period<-as.numberic(expiry-payment). 我想计算一个新的字段订阅期,即period<-as.numberic(expiry-payment).

But, when the Mode of payment is ECS then period should be calculated by the following formula : 但是,当付款方式为ECS时,则应通过以下公式计算期限:

group_by(CustID)
period<-max(expiry)-min(payment)
ugroup()

So for the above data set output should be 因此对于上述数据集输出应为

CustID  Mode_Payment       Payment      Expiry      Amount   Period
    100      ECS         2015-01-01   2015-03-01    1000      273
    200      Online      2015-01-01   2015-05-01    2000      120 
    100      ECS         2015-01-01   2015-10-01    3000      273
    300      Cash        2015-01-01   2015-05-01    5000      120

Unfortunately, I'm getting all kind of errors. 不幸的是,我遇到各种各样的错误。

df<-df %>%
  group_by(custid) %>%
  if(mode_payement=='ECS') {mutate(period=(as.numeric(max(expiry)-min(payement))))
                      } else mutate(period=as.numeric((expiry-payment)))  %>%
  ungroup()

I modified your data a bit in case you have ECS and something else for a customer ID. 我稍微修改了您的数据,以防您拥有ECS和其他用于客户ID的信息。 I chose to use subsetting rather an using ifelse in my approach. 在我的方法中,我选择使用子集而不是使用ifelse。 You have one operation for data with ECS only and the other for the rest. 您只能对ECS进行数据操作,而对其他操作则进行另一操作。

DATA & CODE 数据和代码

mydf <- read.table(text = "CustID  Mode_Payment Payment      Expiry       Amount
100      ECS         2015-01-01   2015-03-01    1000
200      Online      2015-01-01   2015-05-01    2000
100      ECS         2015-01-01   2015-10-01    3000
300      Cash        2015-01-01   2015-05-01    5000
100      Online         2015-01-01   2015-07-01    7000", header = T, stringsAsFactors = FALSE)

  CustID Mode_Payment    Payment     Expiry Amount
1    100          ECS 2015-01-01 2015-03-01   1000
2    200       Online 2015-01-01 2015-05-01   2000
3    100          ECS 2015-01-01 2015-10-01   3000
4    300         Cash 2015-01-01 2015-05-01   5000
5    100       Online 2015-01-01 2015-07-01   7000


library(dplyr)
library(data.table)

#Set Payment and Expiry as Date.
setDT(mydf)[, c("Payment", "Expiry") := lapply(.SD, as.IDate), .SDcols = 3:4]


x <- mydf[Mode_Payment == "ECS"][, period := max(Expiry) - min(Payment), by = CustID]

y <- mydf[Mode_Payment != "ECS"][, period := Expiry - Payment, by = CustID]

rbindlist(list(x, y))

#   CustID Mode_Payment    Payment     Expiry Amount   period
#1:    100          ECS 2015-01-01 2015-03-01   1000 273 days
#2:    100          ECS 2015-01-01 2015-10-01   3000 273 days
#3:    200       Online 2015-01-01 2015-05-01   2000 120 days
#4:    300         Cash 2015-01-01 2015-05-01   5000 120 days
#5:    100       Online 2015-01-01 2015-07-01   7000 181 days

### dplyr way

filter(mydf, Mode_Payment == "ECS") %>%
group_by(CustID) %>%
mutate(period = max(Expiry) - min(Payment)) -> x

filter(mydf, Mode_Payment != "ECS") %>%
mutate(period = Expiry - Payment) -> y

bind_rows(x, y)

Or dplyr with ifelse: 或dplyr与ifelse:

data %>%
  group_by(CustID) %>%
  mutate_each(funs(as.Date), Expiry, Payment) %>%
  mutate(period = 
           (Mode_Payment == "ECS") %>%
           ifelse(
             max(Expiry) - min(Payment),
             Expiry - Payment) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM