简体   繁体   中英

Create a vector using if…else if…else statements in R

I'm trying to obtain a vector of factors X whose values depends on two (maybe more) columns in a data frame. So it can has more than two levels.

There is an easy way to perform it using C/C++-like conditional statements in a for loop. Let's say, If I'm constructing X from values in two boolean columns Col1 and Col2 in a dataframe MATRIX , I can do it easily as:

X=vector()
for ( i in 1:nrow(MATRIX)) {
  if (MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1) { 
    X[i] = "both"
  } else if (MATRIX$Col1[i]==1) {
    X[i] = "col1"
  } else if (MATRIX$Col2[i]==1) {
    X[i] = "col2"
  } else {
    X[i] = "none"
  }
}

The problem is, obviosly, that in large dataframes it takes many time running. I should use vectorization in order to optimize this, but I cannot see the way, since functions as *apply , ifelse or any does not seem help is such a task, where the result is not boolean.

Any ideas?

Here's a couple of ways to do it:

the most analogous to your existing method is:

X <- ifelse(MATRIX$Col1==1,
            ifelse(MATRIX$Col2==1,"both","col1"),
            ifelse(MATRIX$Col2==1,"col2","none"))

It can be slightly quicker to do:

x <- rep(NA,nrow(MATRIX))
x[MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1] <- "both"
x[MATRIX$Col1[i]==1 && !MATRIX$Col2[i]==1] <- "col1"
x[!MATRIX$Col1[i]==1 && MATRIX$Col2[i]==1] <- "col2"
x[!MATRIX$Col1[i]==1 && !MATRIX$Col2[i]==1] <- "none"

but it's harder to see whether all cases have been covered by the code

Note:

  • It looks like MATRIX really is a data.frame ; learning to be precise about you data types can really help when debugging code.
  • If MATRIX$Col1 really is Boolean, you can drop the ==1 comparison, that's wasting time by converting the matrix to numeric and then testing for equality.
  • To me, the most transparant method is to create a small data.frame with the possible values of Col1, Col2 and expected output, and merge this with the existing data.frame, but this may not be as efficient.

We can use factor :

# dummy data
set.seed(1)
MATRIX <- data.frame(Col1 = sample(0:1, 10, replace = TRUE),
                     Col2 = sample(0:1, 10, replace = TRUE))

# using factor
cbind(MATRIX,
      X = factor(paste(as.numeric(MATRIX$Col1 == 1),
                       as.numeric(MATRIX$Col2 == 1), sep = "_"),
                 levels = c("0_0", "0_1", "1_0", "1_1"),
                 labels = c("none", "col2", "col1", "both")))

#     Col1 Col2    X
#  1     0    0 none
#  2     0    0 none
#  3     1    1 both
#  4     1    0 col1
#  5     0    1 col2
#  6     1    0 col1
#  7     1    1 both
#  8     1    1 both
#  9     1    0 col1
# 10     0    1 col2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM