[英]Fill column with unite() using mutate and case_when() statement in R, tidy verse
我有一個名稱列表和為這些名稱分配的閾值,以確定我是否適合分配的名稱。
您可以使用以下方法重新創建測試數據集:
df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"),
level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"),
level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"),
value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))
我想使用 tidy verse mutate()
和case_when()
來找到通過合適閾值的分類級別。 因此,下面的 tidy verse 語句分解了閾值,然后嘗試這樣做。 我的瓶頸
case_when()
與ifelse()
語句 - 使用 ifelse() 可能更合適?level1:level3
語法,這樣做會很痛苦!df_updated <- df %>%
separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>%
mutate(Name_updated = case_when(
threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
TRUE ~ level1)) %>% #Otherwise fill with only level 1
data.frame
所需 output
> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata
下一步需要編寫一個 function,允許用戶指定腳本中使用的閾值。 所以我真的需要讓探測/確定什么閾值通過。
問題在於unite
以及separate
的 ed 列的type
。 默認情況下, convert = FALSE
,它將是一個character
class 列
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>%
type.convert(as.is = TRUE) %>%
separate(value, c("threshold1","threshold2",
"threshold3", "threshold4"), sep =";", convert = TRUE) %>%
mutate(Name_updated =
case_when(
threshold4 >= 50 ~
select(., starts_with('level')) %>%
reduce(str_c, sep=";"),
threshold4 < 50 & threshold3 >= 60 ~
select(., level1:level3) %>%
reduce(str_c, sep=";"),
threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~
select(., level1:level2) %>%
reduce(str_c, sep=";"),
TRUE ~ level1))
# level1 level2 level3 level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta Fungi Basidiomycota 100 5 4 2
#2 Eukaryota Alveolata Ciliophora Spirotrichea 100 100 100 100
#3 Eukaryota Opisthokonta Fungi Basidiomycota 100 80 60 50
#4 Eukaryota Alveolata Ciliophora Spirotrichea 90 50 40 40
#5 Eukaryota Alveolata Dinoflagellata Dinophyceae 100 80 20 0
# Name_updated
#1 Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3 Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4 Eukaryota;Alveolata
#5 Eukaryota;Alveolata
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.