简体   繁体   English

如何在 R 中合并数据框中的两列?

[英]How to merge two columns in a data frame in R?

This is a data frame.这是一个数据框。

df <- tribble(
 ~ID, ~AKT1,  ~AKT3, ~BRCA1, ~BRCA2,
  1800018, FALSE,  TRUE, FALSE, FALSE,
  1800021, FALSE, FALSE, FALSE, FALSE,
  1800026, FALSE, FALSE, FALSE,  TRUE,
  1800027, FALSE, FALSE, FALSE, FALSE
)

Here is the description and category of the colnames.这是 colnames 的描述和类别。

gene_to_pathway <- tribble(
  ~Gene,                   ~Pathway,
  "AKT1",                      "PI3K",
  "AKT3",                      "PI3K",
  "BRCA1",          "Genome Integrity",
  "BRCA2",          "Genome Integrity"
)

What I want is to merge the columns which belong to the same pathway as stated in gene2pathway above.我想要的是合并属于上面gene2pathway中所述的相同pathway的列。

What I want to get is a final data frame like this.我想要得到的是这样的最终数据框。

> df2
PI3K  Genome Integrity 
1800018  TRUE  FALSE 
1800021 FALSE  FALSE 
1800026 FALSE  TRUE  
1800027 FALSE  FALSE 

Any help is appreciated!任何帮助表示赞赏!

library(dplyr)
library(tidyr)

df %>% pivot_longer(-ID) %>% left_join(gene2pathaway, by = c("name" = "Gene")) %>%
  group_by(ID, Pathway) %>% 
  summarise(value = as.logical(sum(value))) %>%
  pivot_wider(id_cols = ID, names_from = Pathway, values_from = value)

       ID `Genome Integrity` PI3K 
    <int> <lgl>              <lgl>
1 1800018 FALSE              TRUE 
2 1800021 FALSE              FALSE
3 1800026 TRUE               FALSE
4 1800027 FALSE              FALSE

Or you can write the function in pivot_wider directly thus saving two steps或者您可以直接在pivot_wider中编写 function 从而节省两个步骤

df %>% pivot_longer(-ID) %>% 
  left_join(gene2pathaway, by = c("name" = "Gene")) %>%
  pivot_wider(id_cols = ID, names_from = Pathway, values_from = value, values_fn = function(x) as.logical(sum(x))) %>%
  column_to_rownames("ID")

         PI3K Genome Integrity
1800018  TRUE            FALSE
1800021 FALSE            FALSE
1800026 FALSE             TRUE
1800027 FALSE            FALSE

dputs used使用的输入

df <- read.table(text = " ID   AKT1  AKT3 BRCA1 BRCA2
 1  1800018 FALSE  TRUE FALSE FALSE
 2  1800021 FALSE FALSE FALSE FALSE
 3  1800026 FALSE FALSE FALSE  TRUE
 4  1800027 FALSE FALSE FALSE FALSE", header = T)

gene2pathaway <- read.table(text = "        Gene           Pathway
      1      AKT1                      PI3K
      2      AKT3                      PI3K
      3     BRCA1          'Genome Integrity'
      4     BRCA2          'Genome Integrity'", header = T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM