如何从R中的多个列中选择投票最多的类别

Question

I have a classification problem I need to solve using R, but to be sincere I have no clue on how to do it.我有一个需要使用 R 解决的分类问题，但老实说，我不知道该怎么做。

I have a table (see below) where different samples are classified by three ML models (one per column), and I need to choose the "most voted" category for each case and write it to a new column.我有一张表（见下文），其中不同的样本按三个 ML 模型（每列一个）分类，我需要为每个案例选择“投票最多”的类别并将其写入新列。

Current table当前表

Desired Output期望的输出

I have been reading about categorical variables in R, but anything seem to fit my specific needs.我一直在阅读 R 中的分类变量，但似乎任何东西都适合我的特定需求。

Any help would be highly appreciated.任何帮助将不胜感激。

Thanks in advance.提前致谢。

JL杰伦

Answer 1

This is not how you ask a question.这不是你提问的方式。 Please see the relevant thread, and in the future offer the data in the form shown below (using dput() and copy and paste the result from the console).请参阅相关线程，并在将来以如下所示的形式提供数据（使用dput()并从控制台复制并粘贴结果）。 At any rate here is a base R solution:无论如何，这里是一个基本的 R 解决方案：

# Calculate the modal values: mode => character vector
df1$mode <- apply(
  df1[,colnames(df1) != "samples"],
  1,
  function(x){
    head(
      names(
        sort(
          table(x), 
          decreasing = TRUE
        )
      ),
     1
    )
  }
)

Data:数据：

df1 <- structure(list(samples = c("S1", "D4", "S2", "D1", "D2", "S3", 
"D3", "S4"), RFpred = c("Carrier", "Absent", "Helper", "Helper", 
"Carrier", "Absent", "Resistant", "Carrier"), SVMpred = c("Absent", 
"Absent", "Helper", "Helper", "Carrier", "Helper", "Helper", 
"Resistant"), KNNpred = c("Carrier", "Absent", "Carrier", "Helper", 
"Carrier", "Absent", "Helper", "Resistant"), mode = c("Carrier", 
"Absent", "Helper", "Helper", "Carrier", "Absent", "Helper", 
"Resistant")), row.names = c(NA, -8L), class = "data.frame")

Answer 2

Tidyverse Approach: Tidyverse 方法：

library(dplyr)
library(tibble)

mode_char <- function(x) {
    ux <- unique(na.omit(x))
    ux[which.max(tabulate(match(x, ux)))]
}

df %>%
    as_tibble() %>%
    rowwise() %>%
    mutate(
        Vote = mode_char(c_across(RFpred:KNNpred))
    )

#> # A tibble: 8 × 5
#> # Rowwise: 
#>   samples RFpred    SVMpred   KNNpred   Vote     
#>   <chr>   <chr>     <chr>     <chr>     <chr>    
#> 1 S1      Carrier   Absent    Carrier   Carrier  
#> 2 D4      Absent    Absent    Absent    Absent   
#> 3 S2      Helper    Helper    Carrier   Helper   
#> 4 D1      Helper    Helper    Helper    Helper   
#> 5 D2      Carrier   Carrier   Carrier   Carrier  
#> 6 S3      Absent    Helper    Absent    Absent   
#> 7 D3      Resistant Helper    Helper    Helper   
#> 8 S4      Carrier   Resistant Resistant Resistant

如何从R中的多个列中选择投票最多的类别

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-07-07 10:56:03

解决方案2
1 2022-07-07 11:21:30

如何从R中的多个列中选择投票最多的类别

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-07-07 10:56:03

解决方案2 1 2022-07-07 11:21:30

解决方案1
1 已采纳 2022-07-07 10:56:03

解决方案2
1 2022-07-07 11:21:30