简体   繁体   中英

R: match elements between vectors - how to optimaze code

let's image that we have data frame of description of several individuals:

des <- c('mad', 'crazy','stupid', 'crazy','wise','dumb','mad','furious')
id <- c(1,2,3,4,5,6,7,8)
d <-data.frame(id,des)
d$dangerous <- NA
dan <-c('mad','crazy','furious')

We want to match d$des with description in vector dan

I prepared the following function:

for (i in 1:nrow(d)){
  for(j in 1:length(dan)){
    if (d$des[i]==dan[j])
      {d$dangerous[i] <- 1 }
  } }
d
  id     des dangerous
1  1     mad         1
2  2   crazy         1
3  3  stupid        NA
4  4   crazy         1
5  5    wise        NA
6  6    dumb        NA
7  7     mad         1
8  8 furious         1

The code works well however I wonder how to optimize the code if it could deal with longer vectors and data frame. Any ideas?

Using ifelse() with %in% will do the trick:

d$dangerous<-ifelse(des %in% dan, 1,NA)
> d
  id     des dangerous
1  1     mad         1
2  2   crazy         1
3  3  stupid        NA
4  4   crazy         1
5  5    wise        NA
6  6    dumb        NA
7  7     mad         1
8  8 furious         1

Here are timings of the several solutions and of a solution of mine.
I have timed the functions with the original data.frame d and with a bigger data.frame, since the OP says it's an optimization problem.

OP <- function(DF, dan){
  DF$dangerous <- NA
  for (i in 1:nrow(DF)){
    for(j in 1:length(dan)){
      if (DF$des[i]==dan[j]) DF$dangerous[i] <- 1
    } 
  }
  DF
}

Carles <- function(DF, dan){
  DF$dangerous<-ifelse(DF$des %in% dan, 1, NA)
  DF
}

arg0naut91_1 <- function(DF, dan){
  DF$dangerous <- NA
  transform(DF, dangerous = replace(dangerous, des %in% dan, 1))
}

arg0naut91_2 <- function(DF, dan){
  DF$dangerous <- NA
  DF$dangerous[DF$des %in% dan] <- 1
  DF
}

Rui <- function(DF, dan){
  DF$dangerous <- c(1, NA)[(DF$des %in% dan) + 1]
  DF
}

library(microbenchmark)

mb <- microbenchmark(
  OP = OP(d, dan),
  Carles = Carles(d, dan),
  Rui = Rui(d, dan),
  arg0naut91_1 = arg0naut91_1(d, dan),
  arg0naut91_2 = arg0naut91_2(d, dan)
)
print(mb, order = "median")
#Unit: microseconds
#         expr     min       lq      mean   median       uq       max neval cld
#          Rui  22.623  25.1865  82.73746  27.2510  31.6630  5441.491   100  a 
#       Carles  31.740  34.4120  76.82339  36.9385  42.1760  3753.407   100  a 
# arg0naut91_2  34.131  36.7140  89.10827  39.5925  46.6930  4577.938   100  a 
# arg0naut91_1 226.237 230.1020 296.23198 234.6225 243.3040  4847.553   100  a 
#           OP 757.831 770.1875 926.88995 781.5630 818.2745 10992.040   100   b



e <- d
for(i in 1:10) e <- rbind(e, e)

mb2 <- microbenchmark(
  OP = OP(e, dan),
  Carles = Carles(e, dan),
  Rui = Rui(e, dan),
  arg0naut91_1 = arg0naut91_1(e, dan),
  arg0naut91_2 = arg0naut91_2(e, dan),
  times = 10
)
print(mb2, order = "median")
#Unit: microseconds
#         expr        min         lq        mean      median         uq        max neval cld
#          Rui    291.090    294.690    346.3638    298.9580    301.238    776.769    10  a 
# arg0naut91_2    288.123    292.236    312.6684    311.2435    314.495    388.212    10  a 
#       Carles    427.500    430.120    447.7170    450.2570    453.884    480.424    10  a 
# arg0naut91_1    513.059    517.822    611.0255    666.7095    670.059    688.023    10  a 
#           OP 898781.320 909717.469 911988.3906 914269.7245 916975.858 919223.886    10   b

Another option:

transform(d, dangerous = replace(dangerous, des %in% dan, 1))

  id     des dangerous
1  1     mad         1
2  2   crazy         1
3  3  stupid        NA
4  4   crazy         1
5  5    wise        NA
6  6    dumb        NA
7  7     mad         1
8  8 furious         1

Or:

d$dangerous[d$des %in% dan] <- 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM