简体   繁体   English

比R中的ifelse()函数更快

[英]Faster function than an ifelse() in r

I have 3 columns Flag, Score, Stage . 我有3列标志,得分,舞台

Flag will have values 1 or 0, Score will be any values above 0. We need to calculate stage values. Flag的值为1或0,Score为大于0的任何值。我们需要计算阶段值。

so our data (stagedata) will look like this: 因此我们的数据(stagedata)将如下所示:

              Flag Score Stage
               1    35
               1    0
               0    12
               ....

IF Flag == 1 and score >= 30, the we calculate stage as 2, 如果Flag == 1且得分> = 30,我们将阶段计算为2,

and if Flag ==0 or Flag == 1 and score < 30, stage = 1. 如果Flag == 0或Flag == 1且得分<30,则阶段= 1。

Any other case stage will be calculated as 0 (ie, due to some error in input or if score or flag is missing). 任何其他情况阶段都将被计算为0(即,由于输入错误或分数或标志缺失)。

        stagedata$Stage <- ifelse(stagedata$Flag==1,ifelse((stagedata$Score>=30),2,1),ifelse(stagedata$Flag==0,1,0))
        stagedata$Stage[is.na(stagedata$Stage)] <-0

IS there a more efficient way to do this using any other function like apply? 有没有其他更有效的方法(例如apply)来执行此操作? The data that we are dealing with are of the order of ten thounsands 我们正在处理的数据大约是10 thunsands

We can convert the logical vector to integer with some arithmetic operation 我们可以通过一些算术运算将逻辑向量转换为整数

v1 <- with(stagedata, 2 *(Flag == 1 & score >= 30) + (Flag %in% 0:1 & score <30))
v1
#[1] 2 1 1 2 1 0

If there are NA values, then replace it with 0 如果有NA值,则将其替换为0

v1[is.na(v1)] <- 0

data 数据

stagedata <- data.frame(Flag = c(1, 1, 0, 1, 0, 2), score = c(35, 0, 12, 31, 27, 31))

The original answer and the fixed answer are different by 1.07x - not 1.4x - not a meaningful difference 原始答案和固定答案的差异为1.07倍-不是1.4倍-没什么大不了的差异

N <- 10000
set.seed(1)
df <- data.frame(Flag = sample(0:1, N, replace=T), Score = sample(c(12, 35), N, replace=T))
  # Flag Score
# 1    0    12
# 2    0    35
# 3    1    35
# 4    1    12
# 5    0    12
# 6    1    12

ifelse_approach <- function() {
  df$Stage <- ifelse(df$Flag==1,ifelse((df$Score>=30),2,1),ifelse(df$Flag==0,1,0))
}

lgl_approach <- function() {
  df$Stage <- with(df, 2 *(Flag == 1 & Score >= 30) + (Flag %in% 0:1 & Score <30))
}

lgl_fix_approach <- function() {
  df$Stage <- with(df, 2 *(Flag == 1 & Score >= 30) + (Flag == 0 | Score < 30))
}

identical(ifelse_approach(), lgl_approach())
# FALSE
identical(ifelse_approach(), lgl_fix_approach())
# TRUE

library(microbenchmark)
microbenchmark(ifelse_approach(), lgl_approach(), lgl_fix_approach(), unit="relative", times=10L)

# Unit: relative
               # expr      min       lq     mean   median       uq       max neval
  # ifelse_approach() 5.949921 6.048253 5.714637 6.737770 7.186373 3.0478402    10
     # lgl_approach() 1.120431 1.111262 1.059140 1.274285 1.376115 0.5364108    10
 # lgl_fix_approach() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000    10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM