简体   繁体   English

如何从R中的多个列中选择投票最多的类别

[英]How to choose the most voted category from multiple columns in R

I have a classification problem I need to solve using R, but to be sincere I have no clue on how to do it.我有一个需要使用 R 解决的分类问题,但老实说,我不知道该怎么做。

I have a table (see below) where different samples are classified by three ML models (one per column), and I need to choose the "most voted" category for each case and write it to a new column.我有一张表(见下文),其中不同的样本按三个 ML 模型(每列一个)分类,我需要为每个案例选择“投票最多”的类别并将其写入新列。

Current table当前表

在此处输入图像描述

Desired Output期望的输出

在此处输入图像描述

I have been reading about categorical variables in R, but anything seem to fit my specific needs.我一直在阅读 R 中的分类变量,但似乎任何东西都适合我的特定需求。

Any help would be highly appreciated.任何帮助将不胜感激。

Thanks in advance.提前致谢。

JL杰伦

This is not how you ask a question.这不是你提问的方式。 Please see the relevant thread, and in the future offer the data in the form shown below (using dput() and copy and paste the result from the console).请参阅相关线程,并在将来以如下所示的形式提供数据(使用dput()并从控制台复制并粘贴结果)。 At any rate here is a base R solution:无论如何,这里是一个基本的 R 解决方案:

# Calculate the modal values: mode => character vector
df1$mode <- apply(
  df1[,colnames(df1) != "samples"],
  1,
  function(x){
    head(
      names(
        sort(
          table(x), 
          decreasing = TRUE
        )
      ),
     1
    )
  }
)

Data:数据:

df1 <- structure(list(samples = c("S1", "D4", "S2", "D1", "D2", "S3", 
"D3", "S4"), RFpred = c("Carrier", "Absent", "Helper", "Helper", 
"Carrier", "Absent", "Resistant", "Carrier"), SVMpred = c("Absent", 
"Absent", "Helper", "Helper", "Carrier", "Helper", "Helper", 
"Resistant"), KNNpred = c("Carrier", "Absent", "Carrier", "Helper", 
"Carrier", "Absent", "Helper", "Resistant"), mode = c("Carrier", 
"Absent", "Helper", "Helper", "Carrier", "Absent", "Helper", 
"Resistant")), row.names = c(NA, -8L), class = "data.frame")

Tidyverse Approach: Tidyverse 方法:

library(dplyr)
library(tibble)

mode_char <- function(x) {
    ux <- unique(na.omit(x))
    ux[which.max(tabulate(match(x, ux)))]
}

df %>%
    as_tibble() %>%
    rowwise() %>%
    mutate(
        Vote = mode_char(c_across(RFpred:KNNpred))
    )

#> # A tibble: 8 × 5
#> # Rowwise: 
#>   samples RFpred    SVMpred   KNNpred   Vote     
#>   <chr>   <chr>     <chr>     <chr>     <chr>    
#> 1 S1      Carrier   Absent    Carrier   Carrier  
#> 2 D4      Absent    Absent    Absent    Absent   
#> 3 S2      Helper    Helper    Carrier   Helper   
#> 4 D1      Helper    Helper    Helper    Helper   
#> 5 D2      Carrier   Carrier   Carrier   Carrier  
#> 6 S3      Absent    Helper    Absent    Absent   
#> 7 D3      Resistant Helper    Helper    Helper   
#> 8 S4      Carrier   Resistant Resistant Resistant

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM