根据另一个因素的水平改变一个因素的水平

Question

I have a data set with many variables, two of which called "animal" and "plant".我有一个包含许多变量的数据集，其中两个称为“动物”和“植物”。 Both variable are factors, and both are binary, ie they are either a text value, or NA.两个变量都是因子，都是二元的，即它们要么是文本值，要么是 NA。

For example:例如：

animal <- c(NA, NA, "cat", "cat", NA)
plant  <- c("ivy", NA, "ivy", NA, NA)
value  <- c(1:5)
df     <- data.frame(animal, plant, value)

> df
  animal plant value
1   <NA>   ivy     1
2   <NA>  <NA>     2
3    cat   ivy     3
4    cat  <NA>     4
5   <NA>  <NA>     5

When the value of plant is "ivy" and the value of animal is "cat", I want to change the value of plant to NA (i,e, the two things can not be true and the animal value takes priority. I don't any changes in my other variables当植物的价值是“常春藤”而动物的价值是“猫”时，我想将植物的价值改为NA（即这两件事不能为真，动物价值优先。我不'我的其他变量没有任何变化

I've tried the following but get an error message:我尝试了以下操作，但收到一条错误消息：

df <- df %>% if (isTRUE(animal == "cat")) {plant==NA}

Error in if (.) isTRUE(animal == "cat") else { : 
  argument is not interpretable as logical
In addition: Warning message:
In if (.) isTRUE(animal == "cat") else { :
  the condition has length > 1 and only the first element will be used

My goal output is:我的目标输出是：

> df
  animal plant value
1   <NA>   ivy     1
2   <NA>  <NA>     2
3    cat  <NA>     3
4    cat  <NA>     4
5   <NA>  <NA>     5

I would really appreciate any help.我真的很感激任何帮助。 I'm sure there is a really simple way of doing this, maybe I can't see the wood for the trees.我确信有一种非常简单的方法可以做到这一点，也许我只见树木不见森林。

Answer 1

library(dplyr)    

df %>% 
      mutate(plant = case_when(animal == 'cat' & plant == 'ivy' ~ NA_character_,
                               TRUE ~ plant))

This gives us:这给了我们：

  animal plant value
1   <NA>   ivy     1
2   <NA>  <NA>     2
3    cat  <NA>     3
4    cat  <NA>     4
5   <NA>  <NA>     5

Answer 2

You could also do:你也可以这样做：

df[!(is.na(df$animal)|is.na(df$plant)),'plant'] <- NA
df
  animal plant value
1   <NA>   ivy     1
2   <NA>  <NA>     2
3    cat  <NA>     3
4    cat  <NA>     4
5   <NA>  <NA>     5

This can also be expressed as:这也可以表示为：

df[!is.na(df$animal) & !is.na(df$plant),'plant'] <- NA

Answer 3

Your problem seems to be simpler than you think.你的问题似乎比你想象的要简单。 You can achieve the same result simply by turning all plants, where animal is not NA , to NA :您可以通过将所有植物（动物不是NA ）转换为NA来实现相同的结果：

df$plant[!is.na(df$animal)] <- NA

Or a bit fancier:或者更高级一点：

is.na(df$plant) <- !is.na(df$animal)

Answer 4

The problem here is that == does not work intuitively with the NA values in R.这里的问题是==不适用于 R 中的NA值。

> df[df$animal=="cat",]
     animal plant value
NA     <NA>  <NA>    NA
NA.1   <NA>  <NA>    NA
3       cat   ivy     3
4       cat  <NA>     4
NA.2   <NA>  <NA>    NA

Here for example all lines are returned because NA == "ANYTHING" returns NA .例如，这里返回所有行，因为NA == "ANYTHING"返回NA 。

You could define this function which returns TRUE if both x and y are equal and not NA , or if both are NA .您可以定义这个函数，如果x和y相等而不是NA ，或者两者都是NA ，则该函数返回TRUE 。

is.equal.force <- `%===%` <- function(x,y, vect=T) {
  res <- ifelse(is.na(x),is.na(y),ifelse(!is.na(y)&!is.na(x),x==y, NA))
  if(!vect){
    res <- all(res)
  }
  return(res)
}

Then the solution to your problem becomes simply:那么您的问题的解决方案就变得很简单：

df[df$animal%===%"cat"&df$plant%===%"ivy","plant"] <- NA
df
  animal plant value
1   <NA>   ivy     1
2   <NA>  <NA>     2
3    cat  <NA>     3
4    cat  <NA>     4
5   <NA>  <NA>     5

Note that the correct syntax was used here.请注意，此处使用了正确的语法。

根据另一个因素的水平改变一个因素的水平

问题描述

4 个解决方案

解决方案1
1 2020-11-09 18:13:42

解决方案2
1 2020-11-09 18:18:31

解决方案3
1 已采纳 2020-11-09 18:19:11

解决方案4
1 2020-11-09 18:20:30

根据另一个因素的水平改变一个因素的水平

问题描述

4 个解决方案

解决方案1 1 2020-11-09 18:13:42

解决方案2 1 2020-11-09 18:18:31

解决方案3 1 已采纳 2020-11-09 18:19:11

解决方案4 1 2020-11-09 18:20:30

解决方案1
1 2020-11-09 18:13:42

解决方案2
1 2020-11-09 18:18:31

解决方案3
1 已采纳 2020-11-09 18:19:11

解决方案4
1 2020-11-09 18:20:30