Setting a value in a variable to NA, conditional on another variable

Question

I'm looking to remove the value in a variable if the condition of another variable is satisfied. For instance:

df$var1[df$condvar == 0] <- NA

The code above works fine, but I need to repeat this for dozens more variables, so var1 above would change to var2 , var3 , etc.. This is always based on the same condvar , although for half of the variables the condition is df$condvar == 1 . It's cumbersome to repeat this line over and over again, and I was wondering if there was a more concise way to code this. Would one of the apply functions help, or would I need to create a custom function?

As a reproducible example, I'm looking to avoid the repetitive nature of the code below:

ex <- mtcars
ex$mpg[ex$vs == 0] <- NA
ex$disp[ex$vs == 0] <- NA
ex$drat[ex$vs == 0] <- NA
ex$cyl[ex$vs == 1] <- NA
ex$hp[ex$vs == 1] <- NA
ex$wt[ex$vs == 1] <- NA
ex


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4             NA   6    NA 110   NA 2.620 16.46  0  1    4    4
Mazda RX4 Wag         NA   6    NA 110   NA 2.875 17.02  0  1    4    4
Datsun 710          22.8  NA 108.0  NA 3.85    NA 18.61  1  1    4    1
Hornet 4 Drive      21.4  NA 258.0  NA 3.08    NA 19.44  1  0    3    1
Hornet Sportabout     NA   8    NA 175   NA 3.440 17.02  0  0    3    2
Valiant             18.1  NA 225.0  NA 2.76    NA 20.22  1  0    3    1
Duster 360            NA   8    NA 245   NA 3.570 15.84  0  0    3    4
etc.

I'd be perfectly happy if there's one line of code that applies to all variables for which condvar == 0 and another for those variables for which condvar == 1 .

Answer 1

Here's an attempt that is hopefully not too complex. If you set up the vars you want to loop over, and the corresponding values you want to be selected for indexing, you can do:

vars   <- c("mpg", "disp", "cyl", "hp")
values <- c(0, 0, 1, 1)

ex[vars] <- Map(function(x,y) replace(x, ex$vs == y, NA), ex[vars], vals)

#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4             NA   6    NA 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag         NA   6    NA 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8  NA 108.0  NA 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4  NA 258.0  NA 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout     NA   8    NA 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1  NA 225.0  NA 2.76 3.460 20.22  1  0    3    1
# ...

If you've only got two groups, you could do this simpler via a couple of assignments as @HubertL and @Phil mentioned in the comments, but using Map allows you consider many variables with many possible index values, without ever extending past 3 lines of code.

Answer 2

Thanks to @HubertL (who is welcome to post this as an answer and I'll upvote) and @smci:

ex[ex$vs == 0, c("mpg", "disp", ...)] <- NA
ex[ex$vs == 1, c("cyl", "hp", ...)] <- NA

Answer 3

The dplyr approach using the new experimental case_when function will go something like:

require(dplyr)

ex <- mtcars
ex <- ex %>%
      mutate(mpg  = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$mpg)) %>%
      mutate(disp = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$disp)) %>%
      mutate(cyl  = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$cyl)) %>%
      mutate(hp   = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$hp))

Notes:

Hadley said on 2016-06-27 "case_when() is still somewhat experiment and does not currently work inside mutate(). That will be fixed in a future version." It took me 40 min to get this code to this point. You get the idea. Once case_when works it'll be good. Meanwhile the workaround with filter() is below
You have to use .$var to reference the variable on the RHS
You have to specify the type of NA on RHS, hence all the as.double(NA)
the TRUE ~ ... specifies the default clause

Working workaround with filter() :

ex <- rbind(ex %>% filter(vs==0) %>% mutate(mpg=NA, disp=NA),
            ex %>% filter(vs==1) %>% mutate(cyl=NA, hp=NA) )

which has the side-effect of rearranging rows due to the split on vs

Answer 4

尝试：

ifelse(df$var1 == 0, NA, df$var1)

Setting a value in a variable to NA, conditional on another variable

Question

4 answers

solution1
5 ACCPTED 2016-10-13 00:13:10

solution2
4 2016-10-13 00:37:36

solution3
3 2016-10-13 00:27:11

solution4
0 2016-10-12 23:59:40

Setting a value in a variable to NA, conditional on another variable

Question

4 answers

solution1 5 ACCPTED 2016-10-13 00:13:10

solution2 4 2016-10-13 00:37:36

solution3 3 2016-10-13 00:27:11

solution4 0 2016-10-12 23:59:40

solution1
5 ACCPTED 2016-10-13 00:13:10

solution2
4 2016-10-13 00:37:36

solution3
3 2016-10-13 00:27:11

solution4
0 2016-10-12 23:59:40