简体   繁体   中英

Efficient alternative to for loop of ifelse

Would be some so kind and suggest fast solution (vectorised & data.table) solution?

Here is sample data and current loop:

set.seed(123)
df <- data.frame(x=abs(rnorm(10)*100),y=abs(rnorm(10)*100))
df$ratio <- df$x/df$y

for(i in 1:nrow(df)) {
df[,c("x")][i] <- ifelse(df[,c("ratio")][i]>1.5,1.5*df[,c("y")][i],df[,c("x")][i])
}

So I'm replacing column x values if the ration is greater than 1.5

try

library(data.table)
setDT(df)[ratio > 1.5, x := 1.5*y]

Or a base R solution

transform(df, x=ifelse(ratio > 1.5, 1.5*y, x))

Benchmarks

set.seed(123)
df1 <- data.frame(x=abs(rnorm(1e6)*100),y=abs(rnorm(1e6)*100))
df1$ratio <-df1$x/df1$y
f1 <- function(){ as.data.table(df1)[ratio > 1.5, x:= 1.5*y]}

f2 <- function(){ indx <- df1$ratio > 1.5
                df1$x[indx] <- df1$y[indx]*1.5}

f3 <- function(){transform(df1, x=ifelse(ratio > 1.5, 1.5*y, x))}

#Another option suggested by @Steven Beaupré
f4 <- function(){mutate(df1, x=ifelse(ratio > 1.5, 1.5*y, x))}

 microbenchmark(f1(),f2(),f3(),f4(), unit='relative', times=40L)
 #Unit: relative
 #expr       min        lq     mean    median       uq      max neval cld
 #f1()  1.000000  1.000000 1.000000  1.000000 1.000000 1.000000    40 a  
 #f2()  2.829316  2.749836 2.047366  2.645489 1.301990 2.070973    40 b 
 #f3() 12.693416 12.954443 9.060991 12.689862 6.170528 7.935411    40 c
 #f4() 13.231567 13.300574 9.147105 12.636984 6.217343 6.286193    40 c

Another option in base R:

indx <- df$ratio > 1.5
df$x[indx] <- df$y[indx] * 1.5

This will likely be pretty fast even with relatively big data sets.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM