简体   繁体   English

使用 mutate 和 case_when 在 dplyr 中通过双重条件重新编码变量

[英]Recoding a variable by a double conditional in dplyr using mutate and case_when

I'm trying to correct errors made in the field when field workers were observing which tree species (SORTNR) were planted in which sites (Siteid).当现场工作人员观察在哪些地点 (Siteid) 种植了哪些树种 (SORTNR) 时,我正在尝试纠正在现场所犯的错误。 Unfortunately, the mistakes which were made are not the same across sites.不幸的是,不同站点所犯的错误并不相同。

What I am trying to express in my code is: When the Siteid and SORTNR are a specific combination, replace the SORTNR with the correct value.我想在我的代码中表达的是:当 Siteid 和 SORTNR 是特定组合时,将 SORTNR 替换为正确的值。 However, when I then inspect the data, all SORTNR are NA.但是,当我随后检查数据时,所有 SORTNR 都是 NA。

If I break it down and run only one of the recoding blocks, it appears as if the variable SORTNR for the combinations not included in the call are set to NA, and that running both blocks will lead to all combinations being set to NA.如果我将其分解并仅运行一个重新编码块,看起来好像调用中未包含的组合的变量 SORTNR 设置为 NA,并且运行这两个块将导致所有组合设置为 NA。

How do I prevent not mentioned combinations being changed to NA?如何防止未提及的组合更改为 NA? Can we make it unnecessary to explicitly state that I want to replace correct values with themselves?我们是否可以不必明确 state 我想用自己替换正确的值?

Sample data:样本数据:

Siteid <- c(rep("F410", 10), "F411","F411","F411","F411","F411")
SORTNR <- c(1,2,4,5,8,9,10,11,12,2,12,14,28,15,12)
Dataframe <- data.frame(cbind(Siteid,SORTNR))

Recoding重新编码

#Recoding Block 1
Dataframe <- Dataframe %>% mutate(SORTNR=case_when(
  Siteid=="F410" & SORTNR==1~2,
  Siteid=="F410" & SORTNR==2~2,
  Siteid=="F410" & SORTNR==4~28,
  Siteid=="F410" & SORTNR==5~28,
  Siteid=="F410" & SORTNR==8~28,
  Siteid=="F410" & SORTNR==9~28,
  Siteid=="F410" & SORTNR==10~27,
  Siteid=="F410" & SORTNR==11~28,
  Siteid=="F410" & SORTNR==12~28))

#Recoding Block 2
Dataframe <- Dataframe %>% mutate(SORTNR=case_when(
 Siteid=="F411" & SORTNR==12~13,
 Siteid=="F411" & SORTNR==28~29,
 Siteid=="F411" & SORTNR==14~14,
 Siteid=="F411" & SORTNR==15~15

Values that don't have a match in a case_when() statement are assigned NA so you need to use TRUE ~ SORTNR as the final condition to avoid this.case_when()语句中不匹配的值被分配为NA ,因此您需要使用TRUE ~ SORTNR作为最终条件来避免这种情况。

library(dplyr)

Dataframe %>%
  mutate(SORTNR = case_when(Siteid=="F410" & SORTNR %in% c(1,2) ~ 2,
                            Siteid=="F410" & SORTNR %in% c(4,5,8,9,11,12) ~ 28,
                            Siteid=="F410" & SORTNR == 10 ~ 27,
                            Siteid=="F411" & SORTNR == 12 ~ 13,
                            Siteid=="F411" & SORTNR == 28 ~ 29,
                            Siteid=="F411" & SORTNR == 14 ~ 14,
                            Siteid=="F411" & SORTNR == 15 ~ 15,
                            TRUE ~ SORTNR))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM