简体   繁体   中英

Setting a value in a variable to NA, conditional on another variable

I'm looking to remove the value in a variable if the condition of another variable is satisfied. For instance:

df$var1[df$condvar == 0] <- NA

The code above works fine, but I need to repeat this for dozens more variables, so var1 above would change to var2 , var3 , etc.. This is always based on the same condvar , although for half of the variables the condition is df$condvar == 1 . It's cumbersome to repeat this line over and over again, and I was wondering if there was a more concise way to code this. Would one of the apply functions help, or would I need to create a custom function?

As a reproducible example, I'm looking to avoid the repetitive nature of the code below:

ex <- mtcars
ex$mpg[ex$vs == 0] <- NA
ex$disp[ex$vs == 0] <- NA
ex$drat[ex$vs == 0] <- NA
ex$cyl[ex$vs == 1] <- NA
ex$hp[ex$vs == 1] <- NA
ex$wt[ex$vs == 1] <- NA
ex


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4             NA   6    NA 110   NA 2.620 16.46  0  1    4    4
Mazda RX4 Wag         NA   6    NA 110   NA 2.875 17.02  0  1    4    4
Datsun 710          22.8  NA 108.0  NA 3.85    NA 18.61  1  1    4    1
Hornet 4 Drive      21.4  NA 258.0  NA 3.08    NA 19.44  1  0    3    1
Hornet Sportabout     NA   8    NA 175   NA 3.440 17.02  0  0    3    2
Valiant             18.1  NA 225.0  NA 2.76    NA 20.22  1  0    3    1
Duster 360            NA   8    NA 245   NA 3.570 15.84  0  0    3    4
etc.

I'd be perfectly happy if there's one line of code that applies to all variables for which condvar == 0 and another for those variables for which condvar == 1 .

Here's an attempt that is hopefully not too complex. If you set up the vars you want to loop over, and the corresponding values you want to be selected for indexing, you can do:

vars   <- c("mpg", "disp", "cyl", "hp")
values <- c(0, 0, 1, 1)

ex[vars] <- Map(function(x,y) replace(x, ex$vs == y, NA), ex[vars], vals)

#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4             NA   6    NA 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag         NA   6    NA 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8  NA 108.0  NA 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4  NA 258.0  NA 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout     NA   8    NA 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1  NA 225.0  NA 2.76 3.460 20.22  1  0    3    1
# ...

If you've only got two groups, you could do this simpler via a couple of assignments as @HubertL and @Phil mentioned in the comments, but using Map allows you consider many variables with many possible index values, without ever extending past 3 lines of code.

Thanks to @HubertL (who is welcome to post this as an answer and I'll upvote) and @smci:

ex[ex$vs == 0, c("mpg", "disp", ...)] <- NA
ex[ex$vs == 1, c("cyl", "hp", ...)] <- NA

The dplyr approach using the new experimental case_when function will go something like:

require(dplyr)

ex <- mtcars
ex <- ex %>%
      mutate(mpg  = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$mpg)) %>%
      mutate(disp = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$disp)) %>%
      mutate(cyl  = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$cyl)) %>%
      mutate(hp   = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$hp))

Notes:

Working workaround with filter() :

ex <- rbind(ex %>% filter(vs==0) %>% mutate(mpg=NA, disp=NA),
            ex %>% filter(vs==1) %>% mutate(cyl=NA, hp=NA) )

which has the side-effect of rearranging rows due to the split on vs

尝试:

ifelse(df$var1 == 0, NA, df$var1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM