[英]Conditional operations with dplyr in R
I have the following data frame df 我有以下数据框df
CustID Mode_Payment Payment Expiry Amount
100 ECS 2015-01-01 2015-03-01 1000
200 Online 2015-01-01 2015-05-01 2000
100 ECS 2015-01-01 2015-10-01 3000
300 Cash 2015-01-01 2015-05-01 5000
I want to calculate a new field subscription period which is period<-as.numberic(expiry-payment).
我想计算一个新的字段订阅期,即
period<-as.numberic(expiry-payment).
But, when the Mode of payment is ECS then period should be calculated by the following formula : 但是,当付款方式为ECS时,则应通过以下公式计算期限:
group_by(CustID)
period<-max(expiry)-min(payment)
ugroup()
So for the above data set output should be 因此对于上述数据集输出应为
CustID Mode_Payment Payment Expiry Amount Period
100 ECS 2015-01-01 2015-03-01 1000 273
200 Online 2015-01-01 2015-05-01 2000 120
100 ECS 2015-01-01 2015-10-01 3000 273
300 Cash 2015-01-01 2015-05-01 5000 120
Unfortunately, I'm getting all kind of errors. 不幸的是,我遇到各种各样的错误。
df<-df %>%
group_by(custid) %>%
if(mode_payement=='ECS') {mutate(period=(as.numeric(max(expiry)-min(payement))))
} else mutate(period=as.numeric((expiry-payment))) %>%
ungroup()
I modified your data a bit in case you have ECS and something else for a customer ID. 我稍微修改了您的数据,以防您拥有ECS和其他用于客户ID的信息。 I chose to use subsetting rather an using ifelse in my approach.
在我的方法中,我选择使用子集而不是使用ifelse。 You have one operation for data with ECS only and the other for the rest.
您只能对ECS进行数据操作,而对其他操作则进行另一操作。
DATA & CODE 数据和代码
mydf <- read.table(text = "CustID Mode_Payment Payment Expiry Amount
100 ECS 2015-01-01 2015-03-01 1000
200 Online 2015-01-01 2015-05-01 2000
100 ECS 2015-01-01 2015-10-01 3000
300 Cash 2015-01-01 2015-05-01 5000
100 Online 2015-01-01 2015-07-01 7000", header = T, stringsAsFactors = FALSE)
CustID Mode_Payment Payment Expiry Amount
1 100 ECS 2015-01-01 2015-03-01 1000
2 200 Online 2015-01-01 2015-05-01 2000
3 100 ECS 2015-01-01 2015-10-01 3000
4 300 Cash 2015-01-01 2015-05-01 5000
5 100 Online 2015-01-01 2015-07-01 7000
library(dplyr)
library(data.table)
#Set Payment and Expiry as Date.
setDT(mydf)[, c("Payment", "Expiry") := lapply(.SD, as.IDate), .SDcols = 3:4]
x <- mydf[Mode_Payment == "ECS"][, period := max(Expiry) - min(Payment), by = CustID]
y <- mydf[Mode_Payment != "ECS"][, period := Expiry - Payment, by = CustID]
rbindlist(list(x, y))
# CustID Mode_Payment Payment Expiry Amount period
#1: 100 ECS 2015-01-01 2015-03-01 1000 273 days
#2: 100 ECS 2015-01-01 2015-10-01 3000 273 days
#3: 200 Online 2015-01-01 2015-05-01 2000 120 days
#4: 300 Cash 2015-01-01 2015-05-01 5000 120 days
#5: 100 Online 2015-01-01 2015-07-01 7000 181 days
### dplyr way
filter(mydf, Mode_Payment == "ECS") %>%
group_by(CustID) %>%
mutate(period = max(Expiry) - min(Payment)) -> x
filter(mydf, Mode_Payment != "ECS") %>%
mutate(period = Expiry - Payment) -> y
bind_rows(x, y)
Or dplyr with ifelse: 或dplyr与ifelse:
data %>%
group_by(CustID) %>%
mutate_each(funs(as.Date), Expiry, Payment) %>%
mutate(period =
(Mode_Payment == "ECS") %>%
ifelse(
max(Expiry) - min(Payment),
Expiry - Payment) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.