简体   繁体   中英

Faster function than an ifelse() in r

I have 3 columns Flag, Score, Stage .

Flag will have values 1 or 0, Score will be any values above 0. We need to calculate stage values.

so our data (stagedata) will look like this:

              Flag Score Stage
               1    35
               1    0
               0    12
               ....

IF Flag == 1 and score >= 30, the we calculate stage as 2,

and if Flag ==0 or Flag == 1 and score < 30, stage = 1.

Any other case stage will be calculated as 0 (ie, due to some error in input or if score or flag is missing).

        stagedata$Stage <- ifelse(stagedata$Flag==1,ifelse((stagedata$Score>=30),2,1),ifelse(stagedata$Flag==0,1,0))
        stagedata$Stage[is.na(stagedata$Stage)] <-0

IS there a more efficient way to do this using any other function like apply? The data that we are dealing with are of the order of ten thounsands

We can convert the logical vector to integer with some arithmetic operation

v1 <- with(stagedata, 2 *(Flag == 1 & score >= 30) + (Flag %in% 0:1 & score <30))
v1
#[1] 2 1 1 2 1 0

If there are NA values, then replace it with 0

v1[is.na(v1)] <- 0

data

stagedata <- data.frame(Flag = c(1, 1, 0, 1, 0, 2), score = c(35, 0, 12, 31, 27, 31))

The original answer and the fixed answer are different by 1.07x - not 1.4x - not a meaningful difference

N <- 10000
set.seed(1)
df <- data.frame(Flag = sample(0:1, N, replace=T), Score = sample(c(12, 35), N, replace=T))
  # Flag Score
# 1    0    12
# 2    0    35
# 3    1    35
# 4    1    12
# 5    0    12
# 6    1    12

ifelse_approach <- function() {
  df$Stage <- ifelse(df$Flag==1,ifelse((df$Score>=30),2,1),ifelse(df$Flag==0,1,0))
}

lgl_approach <- function() {
  df$Stage <- with(df, 2 *(Flag == 1 & Score >= 30) + (Flag %in% 0:1 & Score <30))
}

lgl_fix_approach <- function() {
  df$Stage <- with(df, 2 *(Flag == 1 & Score >= 30) + (Flag == 0 | Score < 30))
}

identical(ifelse_approach(), lgl_approach())
# FALSE
identical(ifelse_approach(), lgl_fix_approach())
# TRUE

library(microbenchmark)
microbenchmark(ifelse_approach(), lgl_approach(), lgl_fix_approach(), unit="relative", times=10L)

# Unit: relative
               # expr      min       lq     mean   median       uq       max neval
  # ifelse_approach() 5.949921 6.048253 5.714637 6.737770 7.186373 3.0478402    10
     # lgl_approach() 1.120431 1.111262 1.059140 1.274285 1.376115 0.5364108    10
 # lgl_fix_approach() 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000    10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM