将变量中的值设置为NA，以另一个变量为条件

Question

I'm looking to remove the value in a variable if the condition of another variable is satisfied. 如果满足另一个变量的条件，我想删除变量中的值。 For instance: 例如：

df$var1[df$condvar == 0] <- NA

The code above works fine, but I need to repeat this for dozens more variables, so var1 above would change to var2 , var3 , etc.. This is always based on the same condvar , although for half of the variables the condition is df$condvar == 1 . 上面的代码工作正常，但我需要重复几十个变量，所以上面的var1会改变为var2 ， var3等。这总是基于相同的condvar ，尽管有一半的变量条件是df$condvar == 1 。 It's cumbersome to repeat this line over and over again, and I was wondering if there was a more concise way to code this. 一遍又一遍地重复这一行是很麻烦的，我想知道是否有更简洁的方法来编写代码。 Would one of the apply functions help, or would I need to create a custom function? 其中一个apply函数是否有帮助，或者我是否需要创建自定义函数？

As a reproducible example, I'm looking to avoid the repetitive nature of the code below: 作为一个可重复的例子，我希望避免下面代码的重复性：

ex <- mtcars
ex$mpg[ex$vs == 0] <- NA
ex$disp[ex$vs == 0] <- NA
ex$drat[ex$vs == 0] <- NA
ex$cyl[ex$vs == 1] <- NA
ex$hp[ex$vs == 1] <- NA
ex$wt[ex$vs == 1] <- NA
ex


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4             NA   6    NA 110   NA 2.620 16.46  0  1    4    4
Mazda RX4 Wag         NA   6    NA 110   NA 2.875 17.02  0  1    4    4
Datsun 710          22.8  NA 108.0  NA 3.85    NA 18.61  1  1    4    1
Hornet 4 Drive      21.4  NA 258.0  NA 3.08    NA 19.44  1  0    3    1
Hornet Sportabout     NA   8    NA 175   NA 3.440 17.02  0  0    3    2
Valiant             18.1  NA 225.0  NA 2.76    NA 20.22  1  0    3    1
Duster 360            NA   8    NA 245   NA 3.570 15.84  0  0    3    4
etc.

I'd be perfectly happy if there's one line of code that applies to all variables for which condvar == 0 and another for those variables for which condvar == 1 . 如果有一行代码适用于condvar == 0所有变量而另一行适用于condvar == 1变量，我会非常高兴。

Answer 1

Here's an attempt that is hopefully not too complex. 这是一次希望不太复杂的尝试。 If you set up the vars you want to loop over, and the corresponding values you want to be selected for indexing, you can do: 如果设置要循环的vars ，并且要为索引选择相应的values ，则可以执行以下操作：

vars   <- c("mpg", "disp", "cyl", "hp")
values <- c(0, 0, 1, 1)

ex[vars] <- Map(function(x,y) replace(x, ex$vs == y, NA), ex[vars], vals)

#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4             NA   6    NA 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag         NA   6    NA 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8  NA 108.0  NA 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4  NA 258.0  NA 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout     NA   8    NA 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1  NA 225.0  NA 2.76 3.460 20.22  1  0    3    1
# ...

If you've only got two groups, you could do this simpler via a couple of assignments as @HubertL and @Phil mentioned in the comments, but using Map allows you consider many variables with many possible index values, without ever extending past 3 lines of code. 如果你只有两个组，你可以通过注释中提到的@HubertL和@Phil等几个赋值来更简单，但是使用Map可以考虑许多带有许多可能索引值的变量，而不会超过3行代码

Answer 2

Thanks to @HubertL (who is welcome to post this as an answer and I'll upvote) and @smci: 感谢@HubertL（欢迎发布此作为答案，我将赞成）和@smci：

ex[ex$vs == 0, c("mpg", "disp", ...)] <- NA
ex[ex$vs == 1, c("cyl", "hp", ...)] <- NA

Answer 3

The dplyr approach using the new experimental case_when function will go something like: 使用新的实验case_when函数的dplyr方法将类似于：

require(dplyr)

ex <- mtcars
ex <- ex %>%
      mutate(mpg  = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$mpg)) %>%
      mutate(disp = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$disp)) %>%
      mutate(cyl  = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$cyl)) %>%
      mutate(hp   = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$hp))

Notes: 笔记：

Hadley said on 2016-06-27 "case_when() is still somewhat experiment and does not currently work inside mutate(). That will be fixed in a future version." Hadley 在2016-06-27说“case_when（）仍然有点实验，目前在mutate（）中没有用。这将在未来版本中修复。” It took me 40 min to get this code to this point. 我花了40分钟才得到这个代码。 You get the idea. 你明白了。 Once case_when works it'll be good. 一旦case_when工作，它会很好。 Meanwhile the workaround with filter() is below 同时filter()的解决方法如下
You have to use .$var to reference the variable on the RHS 您必须使用.$var来引用RHS上的变量
You have to specify the type of NA on RHS, hence all the as.double(NA) 你必须在RHS上指定NA的类型，因此所有as.double(NA)
the TRUE ~ ... specifies the default clause TRUE ~ ...指定默认子句

Working workaround with filter() : 使用filter()工作方法：

ex <- rbind(ex %>% filter(vs==0) %>% mutate(mpg=NA, disp=NA),
            ex %>% filter(vs==1) %>% mutate(cyl=NA, hp=NA) )

which has the side-effect of rearranging rows due to the split on vs 由于vs的分裂，它具有重新排列行的副作用

Answer 4

尝试：

ifelse(df$var1 == 0, NA, df$var1)

将变量中的值设置为NA，以另一个变量为条件

问题描述

4 个解决方案

解决方案1
5 已采纳 2016-10-13 00:13:10

解决方案2
4 2016-10-13 00:37:36

解决方案3
3 2016-10-13 00:27:11

解决方案4
0 2016-10-12 23:59:40

将变量中的值设置为NA，以另一个变量为条件

问题描述

4 个解决方案

解决方案1 5 已采纳 2016-10-13 00:13:10

解决方案2 4 2016-10-13 00:37:36

解决方案3 3 2016-10-13 00:27:11

解决方案4 0 2016-10-12 23:59:40

解决方案1
5 已采纳 2016-10-13 00:13:10

解决方案2
4 2016-10-13 00:37:36

解决方案3
3 2016-10-13 00:27:11

解决方案4
0 2016-10-12 23:59:40