I'm looking to remove the value in a variable if the condition of another variable is satisfied. For instance:
df$var1[df$condvar == 0] <- NA
The code above works fine, but I need to repeat this for dozens more variables, so var1
above would change to var2
, var3
, etc.. This is always based on the same condvar
, although for half of the variables the condition is df$condvar == 1
. It's cumbersome to repeat this line over and over again, and I was wondering if there was a more concise way to code this. Would one of the apply
functions help, or would I need to create a custom function?
As a reproducible example, I'm looking to avoid the repetitive nature of the code below:
ex <- mtcars
ex$mpg[ex$vs == 0] <- NA
ex$disp[ex$vs == 0] <- NA
ex$drat[ex$vs == 0] <- NA
ex$cyl[ex$vs == 1] <- NA
ex$hp[ex$vs == 1] <- NA
ex$wt[ex$vs == 1] <- NA
ex
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 NA 6 NA 110 NA 2.620 16.46 0 1 4 4
Mazda RX4 Wag NA 6 NA 110 NA 2.875 17.02 0 1 4 4
Datsun 710 22.8 NA 108.0 NA 3.85 NA 18.61 1 1 4 1
Hornet 4 Drive 21.4 NA 258.0 NA 3.08 NA 19.44 1 0 3 1
Hornet Sportabout NA 8 NA 175 NA 3.440 17.02 0 0 3 2
Valiant 18.1 NA 225.0 NA 2.76 NA 20.22 1 0 3 1
Duster 360 NA 8 NA 245 NA 3.570 15.84 0 0 3 4
etc.
I'd be perfectly happy if there's one line of code that applies to all variables for which condvar == 0
and another for those variables for which condvar == 1
.
Here's an attempt that is hopefully not too complex. If you set up the vars
you want to loop over, and the corresponding values
you want to be selected for indexing, you can do:
vars <- c("mpg", "disp", "cyl", "hp")
values <- c(0, 0, 1, 1)
ex[vars] <- Map(function(x,y) replace(x, ex$vs == y, NA), ex[vars], vals)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 NA 6 NA 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag NA 6 NA 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 NA 108.0 NA 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 NA 258.0 NA 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout NA 8 NA 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 NA 225.0 NA 2.76 3.460 20.22 1 0 3 1
# ...
If you've only got two groups, you could do this simpler via a couple of assignments as @HubertL and @Phil mentioned in the comments, but using Map
allows you consider many variables with many possible index values, without ever extending past 3 lines of code.
Thanks to @HubertL (who is welcome to post this as an answer and I'll upvote) and @smci:
ex[ex$vs == 0, c("mpg", "disp", ...)] <- NA
ex[ex$vs == 1, c("cyl", "hp", ...)] <- NA
The dplyr approach using the new experimental case_when
function will go something like:
require(dplyr)
ex <- mtcars
ex <- ex %>%
mutate(mpg = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$mpg)) %>%
mutate(disp = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$disp)) %>%
mutate(cyl = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$cyl)) %>%
mutate(hp = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$hp))
Notes:
case_when
works it'll be good. Meanwhile the workaround with filter()
is below .$var
to reference the variable on the RHS as.double(NA)
TRUE ~ ...
specifies the default clause Working workaround with filter()
:
ex <- rbind(ex %>% filter(vs==0) %>% mutate(mpg=NA, disp=NA),
ex %>% filter(vs==1) %>% mutate(cyl=NA, hp=NA) )
which has the side-effect of rearranging rows due to the split on vs
尝试:
ifelse(df$var1 == 0, NA, df$var1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.