簡體   English   中英

在 R 中使用 mutate 和 case_when() 語句用 unite() 填充列,整潔的詩句

[英]Fill column with unite() using mutate and case_when() statement in R, tidy verse

我有一個名稱列表和為這些名稱分配的閾值,以確定我是否適合分配的名稱。

您可以使用以下方法重新創建測試數據集:

df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"), 
             level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"), 
             level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
             level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"), 
             value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))

我想使用 tidy verse mutate()case_when()來找到通過合適閾值的分類級別。 因此,下面的 tidy verse 語句分解了閾值,然后嘗試這樣做。 我的瓶頸

  1. 使用case_when()ifelse()語句 - 使用 ifelse() 可能更合適?
  2. 我不知道如何用串聯的 level1-levelX填充名為Name_updated的新列。 現在, unite() 是不合適的,因為這與整個數據集有關。 實際上我有更多的專欄,所以如果沒有整潔的詩句level1:level3語法,這樣做會很痛苦!
df_updated <- df %>% 
  separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>% 
  mutate(Name_updated = case_when(
    threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
    threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
    threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
    TRUE ~ level1)) %>% #Otherwise fill with only level 1
  data.frame

所需 output

> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata

下一步需要編寫一個 function,允許用戶指定腳本中使用的閾值。 所以我真的需要讓探測/確定什么閾值通過。

問題在於unite以及separate的 ed 列的type 默認情況下, convert = FALSE ,它將是一個character class 列

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>% 
  type.convert(as.is = TRUE) %>%
  separate(value, c("threshold1","threshold2", 
          "threshold3", "threshold4"), sep =";", convert = TRUE) %>% 
  mutate(Name_updated = 
     case_when(
      threshold4 >= 50 ~
         select(., starts_with('level')) %>% 
            reduce(str_c, sep=";"),
       threshold4 < 50 & threshold3 >= 60 ~ 
          select(., level1:level3) %>%
            reduce(str_c, sep=";"), 
       threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ 
          select(., level1:level2) %>% 
            reduce(str_c, sep=";"), 
      TRUE ~ level1))
#  level1       level2         level3        level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta          Fungi Basidiomycota        100          5          4          2
#2 Eukaryota    Alveolata     Ciliophora  Spirotrichea        100        100        100        100
#3 Eukaryota Opisthokonta          Fungi Basidiomycota        100         80         60         50
#4 Eukaryota    Alveolata     Ciliophora  Spirotrichea         90         50         40         40
#5 Eukaryota    Alveolata Dinoflagellata   Dinophyceae        100         80         20          0
#                                 Name_updated
#1                                   Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3  Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4                         Eukaryota;Alveolata
#5                         Eukaryota;Alveolata

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM