简体   繁体   English

将变量中的值设置为NA,以另一个变量为条件

[英]Setting a value in a variable to NA, conditional on another variable

I'm looking to remove the value in a variable if the condition of another variable is satisfied. 如果满足另一个变量的条件,我想删除变量中的值。 For instance: 例如:

df$var1[df$condvar == 0] <- NA

The code above works fine, but I need to repeat this for dozens more variables, so var1 above would change to var2 , var3 , etc.. This is always based on the same condvar , although for half of the variables the condition is df$condvar == 1 . 上面的代码工作正常,但我需要重复几十个变量,所以上面的var1会改变为var2var3等。这总是基于相同的condvar ,尽管有一半的变量条件是df$condvar == 1 It's cumbersome to repeat this line over and over again, and I was wondering if there was a more concise way to code this. 一遍又一遍地重复这一行是很麻烦的,我想知道是否有更简洁的方法来编写代码。 Would one of the apply functions help, or would I need to create a custom function? 其中一个apply函数是否有帮助,或者我是否需要创建自定义函数?

As a reproducible example, I'm looking to avoid the repetitive nature of the code below: 作为一个可重复的例子,我希望避免下面代码的重复性:

ex <- mtcars
ex$mpg[ex$vs == 0] <- NA
ex$disp[ex$vs == 0] <- NA
ex$drat[ex$vs == 0] <- NA
ex$cyl[ex$vs == 1] <- NA
ex$hp[ex$vs == 1] <- NA
ex$wt[ex$vs == 1] <- NA
ex


                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4             NA   6    NA 110   NA 2.620 16.46  0  1    4    4
Mazda RX4 Wag         NA   6    NA 110   NA 2.875 17.02  0  1    4    4
Datsun 710          22.8  NA 108.0  NA 3.85    NA 18.61  1  1    4    1
Hornet 4 Drive      21.4  NA 258.0  NA 3.08    NA 19.44  1  0    3    1
Hornet Sportabout     NA   8    NA 175   NA 3.440 17.02  0  0    3    2
Valiant             18.1  NA 225.0  NA 2.76    NA 20.22  1  0    3    1
Duster 360            NA   8    NA 245   NA 3.570 15.84  0  0    3    4
etc.

I'd be perfectly happy if there's one line of code that applies to all variables for which condvar == 0 and another for those variables for which condvar == 1 . 如果有一行代码适用于condvar == 0所有变量而另一行适用于condvar == 1变量,我会非常高兴。

Here's an attempt that is hopefully not too complex. 这是一次希望不太复杂的尝试。 If you set up the vars you want to loop over, and the corresponding values you want to be selected for indexing, you can do: 如果设置要循环的vars ,并且要为索引选择相应的values ,则可以执行以下操作:

vars   <- c("mpg", "disp", "cyl", "hp")
values <- c(0, 0, 1, 1)

ex[vars] <- Map(function(x,y) replace(x, ex$vs == y, NA), ex[vars], vals)

#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4             NA   6    NA 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag         NA   6    NA 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8  NA 108.0  NA 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4  NA 258.0  NA 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout     NA   8    NA 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1  NA 225.0  NA 2.76 3.460 20.22  1  0    3    1
# ...

If you've only got two groups, you could do this simpler via a couple of assignments as @HubertL and @Phil mentioned in the comments, but using Map allows you consider many variables with many possible index values, without ever extending past 3 lines of code. 如果你只有两个组,你可以通过注释中提到的@HubertL和@Phil等几个赋值来更简单,但是使用Map可以考虑许多带有许多可能索引值的变量,而不会超过3行代码

Thanks to @HubertL (who is welcome to post this as an answer and I'll upvote) and @smci: 感谢@HubertL(欢迎发布此作为答案,我将赞成)和@smci:

ex[ex$vs == 0, c("mpg", "disp", ...)] <- NA
ex[ex$vs == 1, c("cyl", "hp", ...)] <- NA

The dplyr approach using the new experimental case_when function will go something like: 使用新的实验case_when函数的dplyr方法将类似于:

require(dplyr)

ex <- mtcars
ex <- ex %>%
      mutate(mpg  = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$mpg)) %>%
      mutate(disp = case_when(.$vs==0 ~ as.double(NA), TRUE ~ .$disp)) %>%
      mutate(cyl  = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$cyl)) %>%
      mutate(hp   = case_when(.$vs==1 ~ as.double(NA), TRUE ~ .$hp))

Notes: 笔记:

Working workaround with filter() : 使用filter()工作方法:

ex <- rbind(ex %>% filter(vs==0) %>% mutate(mpg=NA, disp=NA),
            ex %>% filter(vs==1) %>% mutate(cyl=NA, hp=NA) )

which has the side-effect of rearranging rows due to the split on vs 由于vs的分裂,它具有重新排列行的副作用

尝试:

ifelse(df$var1 == 0, NA, df$var1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 以另一个变量的值为条件,用先前行中的非NA值替换数据框中变量的NA值 - Replacing NA values for a variable in a dataframe with non-NA values from prior rows conditional on values of another variable R:在另一个变量的值为NA的情况下,如何将变量的值重新编码为NA - R: How to recode values of a variable to NA for cases where another variable has a value of NA 根据另一个(时间)变量检索第一个非 NA 值 - Retrieve first non NA value based on another (time) variable 分组定义新变量,条件是另一个变量的值 - Making a new variable, means by group, conditional on value of another variable 如何使一个变量成为另一个变量的条件并根据概率分配一个值 - How to make a variable conditional to another variable and assign a value based on probability 如何根据另一个变量的值计算 R 中的变量? - How can I count a variable in R conditional on the value of another variable? 根据R中另一个变量的值创建一个变量的rowSum - Creating rowSums of one variable conditional on the value of another variable in R 如果值为 NA,则估算变量 dplyr - imputing variable if value is NA dplyr 创建新变量,直到另一个变量的第一个非NA值均为0,此后为1(在组中) - Create new variable that is 0 until the first non-NA value of another variable, then 1 thereafter (within a group) 创建一个变量,该变量等于每个变量的id的最后一个非NA值 - Create a variable equal to the last-non NA value per id of another variable
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM