[英]add column based on multiple conditions with mutate() in tidy R grep
我正在努力對添加的列中的寬數據框進行分類,但基於多列的閾值 (>0)。 SO 上的先前示例需要列的完整名稱以及帶有 > 和 == 的 if else() 語句。 但是我需要能夠使用 grep() 或 contains() 來隔離基於公共字符串的列。
輸入數據框:
library(tidyverse)
df <- data.frame(
"ID" = c("asdf","vfdkun", "seifu", "seijd", "qweri"),
"elephant_zoo" = c(1,1,1,2,0), #Should not be useful there
"rhino_zoo" = c(1,2,3,1,0), #Should not be useful there
"hippo_zoo" = c(1,1,0,0,0),
"elephant_wild_A" = c(0,0,1,1,3),
"rhino_wild_A" = c(0,0,4,3,1),
"elephant_wild_B" = c(0,0,0,0,0),
"rhino_wild_C" = c(0,0,1,5,7),
"hippo_wild_B" = c(0,0,0,0,0)) %>%
column_to_rownames(var = "ID")
df
實際上,這有更多的列和行!
所需的輸出數據幀已對行( ZOO
和WILD
)進行了CLASSIFICATION
並對這些CLASSIFICATION
了編譯。
df_goal <- data.frame(
"ID" = c("asdf","vfdkun", "seifu", "seijd", "qweri"),
"elephant_zoo" = c(1,1,1,2,2), #Should not be useful there
"rhino_zoo" = c(1,2,3,1,2), #Should not be useful there
"hippo_zoo" = c(1,1,0,0,2),
"elephant_wild_A" = c(0,0,1,1,3),
"rhino_wild_A" = c(1,0,4,3,1),
"elephant_wild_B" = c(0,0,0,0,0),
"rhino_wild_C" = c(6,0,1,5,7),
"hippo_wild_B" = c(0,0,0,0,0)) %>%
column_to_rownames(var = "ID") %>%
add_column(ZOO = c("zoo", "zoo", "zoo", "zoo", "")) %>%
add_column(WILD = c("", "", "wild", "wild", "wild")) %>%
add_column(CLASSIFICATION = c("zoo only", "zoo only", "both", "both", "wild only"))
df_goal
我希望使用mutate()
和case_when()
,但我無法正確選擇多列。 嘗試的例子:
# using an if else statement
df %>%
mutate(ZOO = ifelse(select(contains("zoo")) > 0, "zoo", "F"))
# using mutate and case_when
df %>%
mutate(ZOO = case_when(
select(contains("zoo")) > 0 ~ "zoo",
TRUE ~ ""))
我的實際數據框有更多類別,因此能夠將其分解為 ZOO 與 WILD,然后跟進已編譯的列。
您可以嘗試使用reduce
從purrr
包。 人們可以使用一個中間函數any_cols
到所作的代碼更清晰並與使用它across
:
library(tidyverse)
any_cols <- function(df) reduce(df, `|`)
df %>%
mutate(ZOO = ifelse(any_cols(across(contains("zoo"), ~`>`(.,0))), "zoo", "F"))
elephant_zoo rhino_zoo hippo_zoo elephant_wild_A rhino_wild_A elephant_wild_B rhino_wild_C hippo_wild_B ZOO
1 1 1 1 0 0 0 0 0 zoo
2 1 2 1 0 0 0 0 0 zoo
3 1 3 0 1 4 0 1 0 zoo
4 2 1 0 1 3 0 5 0 zoo
5 0 0 0 3 1 0 7 0 F
df %>%
mutate(ZOO =
case_when(any_cols(across(contains("zoo"), ~`>`(.,0))) ~ "zoo",
TRUE ~ "F"))
elephant_zoo rhino_zoo hippo_zoo elephant_wild_A rhino_wild_A elephant_wild_B rhino_wild_C hippo_wild_B ZOO
1 1 1 1 0 0 0 0 0 zoo
2 1 2 1 0 0 0 0 0 zoo
3 1 3 0 1 4 0 1 0 zoo
4 2 1 0 1 3 0 5 0 zoo
5 0 0 0 3 1 0 7 0 F
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.