[英]How to plot correlation graphs with R^2 for a big datamatrix?
I have a proteomics data matrix. 我有一个蛋白质组学数据矩阵。 In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).
在数据矩阵中,我检测到每种蛋白质的肽数不同(可检测的肽数随蛋白质而异)。
Q1. Q1。 How can I plot correlation graphs for each protein to compare how its' peptides behave.
如何绘制每种蛋白质的相关图以比较其肽的行为。 ie For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.
即对于蛋白质A,我有肽a1-a3,我想比较a1与a2,a1与a3以及a2与a3。
Sample data 样本数据
structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs. 因此,每种蛋白质的肽段数量各不相同,如何比较每种肽段并将多面图保存到单个图中,因此,我只能选择所需的图。
"Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs." “因此每种蛋白质的肽段数量各不相同,我如何比较每种肽段并将多面图保存到单个图中,这样,我只能选择所需的图。” I'm not entirely sure what you actually want to plot.
我不确定您实际要绘制什么 。 A correlation plot of which quantities?
哪些数量的相关图? Select only which required graphs?
仅选择所需的图表?
Anyway, perhaps the following will help. 无论如何,也许以下内容会有所帮助。
library(GGally)
library(tidyverse)
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = 3:6)
Explanation: We reshape data such that we have Value
s for every Peptide
in columns; 说明:我们对数据进行
Peptide
以使列中的每个Peptide
都有Value
。 then we filter entries for Protein == "A"
and use GGally::ggpairs
to show pairwise correlation plots of Value
s for every Peptide
. 然后我们过滤
Protein == "A"
条目,并使用GGally::ggpairs
显示每个Peptide
的Value
的成对相关图。
You have a lot of flexibility in customising the output plot of ggpairs
(eg add regression lines, remove panels, etc.); 在定制
ggpairs
的输出图时,您具有很大的灵活性(例如,添加回归线,删除面板等); I recommend taking a look at the GGally GitHub project page and at Multiple regression lines in ggpairs . 我建议您查看GGally GitHub项目页面和ggpairs中的多行回归 。
If you want to show correlation plots only for certain Peptide
s, you could do the following 如果只想显示某些
Peptide
的相关图,则可以执行以下操作
pep_of_interest <- c("a2", "a4")
df %>%
gather(Sample, Value, -Protein, -Peptide) %>%
spread(Peptide, Value) %>%
filter(Protein == "A") %>%
ggpairs(columns = match(pep_of_interest, colnames(.)))
Here is a solution using the corrplot
library if you are looking for visual representation of correlation. 如果您正在寻找相关性的视觉表示,这是使用
corrplot
库的解决方案。 A lot more plotting options are available in the library (take a look at the corrplot vignette ). 库中提供了更多绘图选项(请查看corrplot小插图 )。
# sample data
dd <- structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
# for Protein A, build subset of data
tempdd <- dd[dd$Protein == "A",][,-1]
cc <- tempdd[,1]
tempdd <- t(tempdd[,-1])
colnames(tempdd) <- cc
# calculate the correlations for all samples
rr <- cor(tempdd)
# install.packages("corrplot")
library(corrplot)
#Build the plot
corrplot(rr,method='circle')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.