简体   繁体   中英

Combine same header columns in R data frame

My input data frame contains more than 100 columns and rows. I want to combine the columns if their header is the same.

Following is my input data frame

Case.ID HRAS    TP53    MAP3K1  MAP3K1  TP53
TCGA_1          MSE;                    MSE;
TCGA_2  MUT;                  
TCGA_3                    
TCGA_4          MUT;                    AMP;
TCGA_5                  MSE;          
TCGA_6                    
TCGA_7  MUT;                  
TCGA_8  MUT;    AMP;              
TCGA_9  MUT;                  
TCGA_10                   
TCGA_11                 FRM;    st_gai;   
TCGA_12                                 HDEL;
          

Expected output

Case.ID HRAS    TP53      MAP3K1
TCGA_1          MSE;      
TCGA_2  MUT;          
TCGA_3            
TCGA_4          MUT;AMP;      
TCGA_5                    MSE;
TCGA_6            
TCGA_7  MUT;          
TCGA_8  MUT;    AMP;      
TCGA_9  MUT;          
TCGA_10           
TCGA_11                  FRM;st_gai;
TCGA_12         HDEL;     

In the expected output, you can see I have combined the same header columns in such a way that if they have the same entry present in the row then it will print only once and if different entries are present in a row then all entries will be taken together. Here they just combine the selected column Combine two or more columns in a dataframe into a new column with a new name

You can try putting it into long format then after adjusting the names with str_remove we pivot wider, using the values_fn we paste the values

librray(tidyr)

cbind(Case.ID = rownames(df), stack(df[-1])) |> 
  transform(ind = as.character(ind) |> stringr::str_remove_all(".1")) |> 
  pivot_wider(Case.ID, names_from = ind, values_from = values, 
              values_fn = \(x) paste(x, collapse = ""))
   Case.ID HRAS   TP53       MAP3         
   <chr>   <chr>  <chr>      <chr>        
 1 TCGA_1  "MSE;" "MSE;"     ""           
 2 TCGA_2  ""     ""         ""           
 3 TCGA_3  ""     ""         ""           
 4 TCGA_4  ""     "MUT;AMP;" ""           
 5 TCGA_5  ""     "MSE;"     ""           
 6 TCGA_6  ""     ""         ""           
 7 TCGA_7  ""     ""         ""           
 8 TCGA_8  "AMP;" ""         ""           
 9 TCGA_9  ""     ""         ""           
10 TCGA_10 ""     ""         ""           
11 TCGA_11 ""     ""         "FRM;st_gai;"
12 TCGA_12 ""     "HDEL;"    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM