簡體   English   中英

R ggplot2 TCGA表達數據的分組箱線圖

[英]R ggplot2 grouped boxplot of TCGA expression data

目前,我有來自TCGA的基因表達數據,並將某些基因加載到這樣的data.frame中(T代表腫瘤樣本,N代表正常組織樣本):

             Gene1  Gene2  Gene3 ...
Patient_T 1    2      3      1
Patient_T 2    1      5      6 
Patient_N 1    3      6      1
Patient_N 2    3      6      1
...

我現在想用ggplot2創建一個分組的箱線圖。 該圖應在x軸上描繪所有候選基因,並在y軸上描繪每個基因的腫瘤和正常分組的表達水平。

在其他線程中,發出分組的箱線圖; 他們使用了不同格式的data.frame。 我只是想知道是否有基於此data.frame格式的實用解決方案來創建分組圖(即,行名稱為Patient_ID)。

概觀

注意:生物學根本不是我擅長的領域,因此,如果我誤解了樣本數據集中的任何內容,請告訴我。

數據從寬格式 為長格式 (每位患者,組織類型和基因一份記錄)是使用構建分組 在您的情況下,數據框的行名包含兩項信息:組織類型和患者ID。 將它們分成兩列后,我將所有Gene1Gene2Gene3Gene3到兩列中: geneexpression_level 這就是將原始的4 x 3數據幀轉換為12 x 4整潔數據集的方式。

SS分組箱圖

# load necessary packages ----
library(tidyverse)

# load necessary data ----
df <-
  data.frame(Gene1 = c(2, 1, 3, 3)
             , Gene2 = c(3, 5, 6, 6)
             , Gene3 = c(1, 6, 1, 1)
             , row.names = c("Patient_T 1"
                             , "Patient_T 2"
                             , "Patient_N 1"
                             , "Patient_N 2"))

# reshape data so that it contains one record per: ----
# - patient
# - gene
# - tissue type
tidy.df <-
  df %>%
  # pid for Patient ID
  rownames_to_column(var = "pid") %>%
  # only keep the suffix in pid
  mutate(pid = str_extract(pid, "(T|N)\\s{1}\\d{1}")) %>%
  # separate pid from tissue type in two dif columns
  separate(col = "pid"
           , into = c("type", "pid")
           , sep = "\\s{1}") %>%
  gather(key = "gene"
         , value = "expression_level"
         , matches("Gene")) %>%
  # remove 'Gene' from gene column
  # and specify the 'type' values
  mutate(gene = str_extract(gene, "\\d{1}")
         , type = case_when(
           type == "N" ~ "Normal"
           , type == "T" ~ "Tumor"
         )) %>%
  # arrange tibble by pid
  arrange(pid) %>%
  as.tibble()

# create a grouped boxplot with ggplot2 ----
# The graph should depict all the gene candidates 
# in the x-axis and the expression level 
# in the y-axis grouped by tumor and normal for each gene.
tidy.df %>%
  ggplot(aes(x = gene, y = expression_level, fill = gene)) +
  geom_boxplot() +
  # visualizes the distribution of expression level by gene by tissue type
  # i.e. one set of boxplots for nomal and tumor
  facet_wrap(facets = vars(type)) +
  ylab("Expression level") +
  labs(title = "Gene expression data by tissue type"
       , caption = "Source: TCGA")

# end of script #

會議信息

R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils    
[5] datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2  forcats_0.3.0   stringr_1.3.1  
 [4] dplyr_0.7.6     purrr_0.2.5     readr_1.1.1    
 [7] tidyr_0.8.1     tibble_1.4.2    ggplot2_3.1.0  
[10] tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.4  haven_1.1.2      
 [3] lattice_0.20-38   colorspace_1.3-2 
 [5] htmltools_0.3.6   viridisLite_0.3.0
 [7] yaml_2.2.0        utf8_1.1.4       
 [9] rlang_0.3.0.1     pillar_1.3.0     
[11] glue_1.3.0        withr_2.1.2      
[13] modelr_0.1.2      readxl_1.1.0     
[15] bindr_0.1.1       plyr_1.8.4       
[17] munsell_0.5.0     gtable_0.2.0     
[19] cellranger_1.1.0  rvest_0.3.2      
[21] evaluate_0.11     labeling_0.3     
[23] knitr_1.20        fansi_0.3.0      
[25] broom_0.5.0       Rcpp_0.12.19     
[27] scales_1.0.0      backports_1.1.2  
[29] jsonlite_1.5      gridExtra_2.3    
[31] hms_0.4.2         digest_0.6.18    
[33] stringi_1.2.4     grid_3.5.2       
[35] rprojroot_1.3-2   cli_1.0.1        
[37] tools_3.5.2       magrittr_1.5     
[39] lazyeval_0.2.1    crayon_1.3.4     
[41] pkgconfig_2.0.2   xml2_1.2.0       
[43] lubridate_1.7.4   assertthat_0.2.0 
[45] rmarkdown_1.10    httr_1.3.1       
[47] rstudioapi_0.8    viridis_0.5.1    
[49] R6_2.2.2          nlme_3.1-137     
[51] compiler_3.5.2   

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM