[英]Return column header if row values are above a certain threshold
我有一個名為tt
的數據框。 我想創建一個名為 Ethnicity 的新列,我想為超過 80% 的每一行值設置一個列標題。 如果沒有一行的值大於 80%,那么我希望在該行中有字符串“MIX”。
tt <- structure(list(INDIVIDUAL = c("SJL0253301", "SJL1073801", "SJL1066401",
"SJL1762813"), EUR = c(0.974378, 0.496489, 1e-05, 1e-05), EAS = c(0.010592,
0.438799, 0.99996, 1e-05), AMR = c(0.004699, 1e-05, 1e-05, 0.99996
), SAS = c(1e-05, 0.053618, 1e-05, 1e-05), AFR = c(0.010321,
0.011084, 1e-05, 1e-05)), row.names = c(1L, 44L, 19L, 911L), class = "data.frame")
我想要的結果:
INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
我們可以使用max.col
返回顯示值大於 0.8 的first
列索引(對於每一行),然后將MIX
分配給那些沒有"MIX"
的情況
tt$Ethnicity <- names(tt)[-1][max.col(tt[-1] > 0.8, "first")]
tt$Ethnicity[!rowSums(tt[2:6] > 0.8)] <- "MIX"
-輸出
> tt
INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
1 SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
44 SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
19 SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
911 SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
另一種可能的解決方案,在base R
中:
cbind(tt, Ethnicity = apply(tt[-1] > 0.8, 1, \(x) if (any(x)) names(x)[x] else "MIX"))
#> INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
#> 1 SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
#> 44 SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
#> 19 SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
#> 911 SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
這是一個tidyverse
方法:
library(dplyr)
library(tidyr)
tt %>%
mutate(across(-INDIVIDUAL, ~case_when(. > 0.8 ~ cur_column()), .names = "new_{.col}")) %>%
unite(Ethnicity, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(Ethnicity = ifelse(Ethnicity== "", "MIX", Ethnicity))
INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
1 SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
44 SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
19 SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
911 SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
這是另一種選擇:
library(dplyr)
tt %>%
rowwise() %>%
mutate(Ethnicity = ifelse(all(c_across(-INDIVIDUAL) < 0.8), "MIX", names(which.max(across(-INDIVIDUAL))))) %>%
ungroup()
輸出
INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
1 SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
44 SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
19 SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
911 SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
這是一個data.table
方法:
library(data.table)
setDT(tt)[, Ethnicity := names(.SD)[unlist(.SD) > 0.8],
by = INDIVIDUAL][is.na(Ethnicity), Ethnicity := "MIX"]
輸出
INDIVIDUAL EUR EAS AMR SAS AFR Ethnicity
<char> <num> <num> <num> <num> <num> <char>
1: SJL0253301 0.974378 0.010592 0.004699 0.000010 0.010321 EUR
2: SJL1073801 0.496489 0.438799 0.000010 0.053618 0.011084 MIX
3: SJL1066401 0.000010 0.999960 0.000010 0.000010 0.000010 EAS
4: SJL1762813 0.000010 0.000010 0.999960 0.000010 0.000010 AMR
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.