[英]Change the level of a factor based on the level of another factor
I have a data set with many variables, two of which called "animal" and "plant".我有一个包含许多变量的数据集,其中两个称为“动物”和“植物”。 Both variable are factors, and both are binary, ie they are either a text value, or NA.
两个变量都是因子,都是二元的,即它们要么是文本值,要么是 NA。
For example:例如:
animal <- c(NA, NA, "cat", "cat", NA)
plant <- c("ivy", NA, "ivy", NA, NA)
value <- c(1:5)
df <- data.frame(animal, plant, value)
> df
animal plant value
1 <NA> ivy 1
2 <NA> <NA> 2
3 cat ivy 3
4 cat <NA> 4
5 <NA> <NA> 5
When the value of plant is "ivy" and the value of animal is "cat", I want to change the value of plant to NA (i,e, the two things can not be true and the animal value takes priority. I don't any changes in my other variables当植物的价值是“常春藤”而动物的价值是“猫”时,我想将植物的价值改为NA(即这两件事不能为真,动物价值优先。我不'我的其他变量没有任何变化
I've tried the following but get an error message:我尝试了以下操作,但收到一条错误消息:
df <- df %>% if (isTRUE(animal == "cat")) {plant==NA}
Error in if (.) isTRUE(animal == "cat") else { :
argument is not interpretable as logical
In addition: Warning message:
In if (.) isTRUE(animal == "cat") else { :
the condition has length > 1 and only the first element will be used
My goal output is:我的目标输出是:
> df
animal plant value
1 <NA> ivy 1
2 <NA> <NA> 2
3 cat <NA> 3
4 cat <NA> 4
5 <NA> <NA> 5
I would really appreciate any help.我真的很感激任何帮助。 I'm sure there is a really simple way of doing this, maybe I can't see the wood for the trees.
我确信有一种非常简单的方法可以做到这一点,也许我只见树木不见森林。
library(dplyr)
df %>%
mutate(plant = case_when(animal == 'cat' & plant == 'ivy' ~ NA_character_,
TRUE ~ plant))
This gives us:这给了我们:
animal plant value
1 <NA> ivy 1
2 <NA> <NA> 2
3 cat <NA> 3
4 cat <NA> 4
5 <NA> <NA> 5
You could also do:你也可以这样做:
df[!(is.na(df$animal)|is.na(df$plant)),'plant'] <- NA
df
animal plant value
1 <NA> ivy 1
2 <NA> <NA> 2
3 cat <NA> 3
4 cat <NA> 4
5 <NA> <NA> 5
This can also be expressed as:这也可以表示为:
df[!is.na(df$animal) & !is.na(df$plant),'plant'] <- NA
Your problem seems to be simpler than you think.你的问题似乎比你想象的要简单。 You can achieve the same result simply by turning all plants, where animal is not
NA
, to NA
:您可以通过将所有植物(动物不是
NA
)转换为NA
来实现相同的结果:
df$plant[!is.na(df$animal)] <- NA
Or a bit fancier:或者更高级一点:
is.na(df$plant) <- !is.na(df$animal)
The problem here is that ==
does not work intuitively with the NA
values in R.这里的问题是
==
不适用于 R 中的NA
值。
> df[df$animal=="cat",]
animal plant value
NA <NA> <NA> NA
NA.1 <NA> <NA> NA
3 cat ivy 3
4 cat <NA> 4
NA.2 <NA> <NA> NA
Here for example all lines are returned because NA == "ANYTHING"
returns NA
.例如,这里返回所有行,因为
NA == "ANYTHING"
返回NA
。
You could define this function which returns TRUE
if both x
and y
are equal and not NA
, or if both are NA
.您可以定义这个函数,如果
x
和y
相等而不是NA
,或者两者都是NA
,则该函数返回TRUE
。
is.equal.force <- `%===%` <- function(x,y, vect=T) {
res <- ifelse(is.na(x),is.na(y),ifelse(!is.na(y)&!is.na(x),x==y, NA))
if(!vect){
res <- all(res)
}
return(res)
}
Then the solution to your problem becomes simply:那么您的问题的解决方案就变得很简单:
df[df$animal%===%"cat"&df$plant%===%"ivy","plant"] <- NA
df
animal plant value
1 <NA> ivy 1
2 <NA> <NA> 2
3 cat <NA> 3
4 cat <NA> 4
5 <NA> <NA> 5
Note that the correct syntax was used here.请注意,此处使用了正确的语法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.