繁体   English   中英

将字符的df转换成具体的数字

[英]Convert df of characters into specific numbers

一般的:

在字符的 df 中,将它们转换为数字(用作热图)。

具体的:

我收集了不同基因的注释,发现它们在很多情况下都不一致。 现在我想将其可视化为热量 map。为此,我需要将注释的字符向量转换为数字。 我尝试将对话纳入因素,但这让我无法控制将哪个字符分配给哪个数字。 由于控制这一点是有意义的,因此因子转换没有提供预期的结果。

启动测向仪:

df_char <- data.frame(
 id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
 annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
 annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
 annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

期望的结果:

df_num <- data.frame(
 id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
 annoA = c(1, 2, 2, 1, NA),
 annoB = c(1, 1, 3, 3, 3),
 annoC = c(1, 2, 2, 1, NA)
  )

我尝试了 ifelse function,但无济于事:

granule_coverter <- function(df, col) {
 df$col <- ifelse(df$col == 'primary', 1, df$col)
 df$col <- ifelse(df$col == 'secondary', 2, df$col)
 df$col <- ifelse(df$col == 'tertiary', 3, df$col)
 df$col <- ifelse(df$col == 'ficolin-1', 4, df$col)
 df$col <- ifelse(df$col == 'secretory', 5, df$col)
 return(df)
}

您可以使用match()

library(dplyr)

df_char %>%
  mutate(across(starts_with("anno"),
         ~ match(.x, c('primary', 'secondary', 'tertiary'))))

#      id annoA annoB annoC
# 1 Gene1     1     1     1
# 2 Gene2     2     1     2
# 3 Gene3     3     3     2
# 4 Gene4     1     3     1
# 5 Gene5    NA     3    NA

dplyr::recode()

df_char %>%
  mutate(across(starts_with("anno"),
         ~ recode(.x, 'primary' = 1L, 'secondary' = 2L, 'tertiary' = 3L)))

有很多方法可以处理这个任务; 一个可能的选择是在要重新编码的每一列中使用case_when() (来自dplyr package ),例如

library(dplyr)

df_char <- data.frame(
  id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
  annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
  annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
  annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

df_char %>%
  mutate(across(starts_with("anno"), ~case_when(
    .x == "primary" ~ 1,
    .x == "secondary" ~ 2,
    .x == "tertiary" ~ 3,
    TRUE ~ NA_real_
  )))
#>      id annoA annoB annoC
#> 1 Gene1     1     1     1
#> 2 Gene2     2     1     2
#> 3 Gene3     3     3     2
#> 4 Gene4     1     3     1
#> 5 Gene5    NA     3    NA

创建于 2023-01-16,使用reprex v2.0.2


另一个可能的选择是创建一个键值对的“查找表”并使用它来recode()感兴趣的列,例如

library(dplyr)

df_char <- data.frame(
  id = c('Gene1', 'Gene2', 'Gene3', 'Gene4', 'Gene5'),
  annoA = c('primary', 'secondary', 'tertiary', 'primary', NA),
  annoB = c('primary', 'primary', 'tertiary', 'tertiary', 'tertiary'),
  annoC = c('primary', 'secondary', 'secondary', 'primary', NA)
)

key_value_pairs <- c("primary" = 1, secondary = 2, "tertiary" = 3)
df_char %>%
  mutate(across(starts_with("anno"), ~recode(.x, !!!key_value_pairs)))
#>      id annoA annoB annoC
#> 1 Gene1     1     1     1
#> 2 Gene2     2     1     2
#> 3 Gene3     3     3     2
#> 4 Gene4     1     3     1
#> 5 Gene5    NA     3    NA

创建于 2023-01-16,使用reprex v2.0.2

match的基本选项

> df_char[-1] <- match(as.matrix(df_char[-1]), c("primary", "secondary", "tertiary", "ficolin-1", "secretory"))

> df_char
     id annoA annoB annoC
1 Gene1     1     1     1
2 Gene2     2     1     2
3 Gene3     3     3     2
4 Gene4     1     3     1
5 Gene5    NA     3    NA

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM