简体   繁体   English

如何将spearman相关p值与相关系数一起添加到ggpairs?

[英]How to add the spearman correlation p value along with correlation coefficient to ggpairs?

Constructing a ggpairs figure in R using the following code.使用以下代码在 R 中构造一个 ggpairs 图。

df is a dataframe containing 6 continuous variables and one Group variable df 是一个 dataframe 包含 6 个连续变量和一个变量

ggpairs(df[,-1],columns = 1:ncol(df[,-1]),
mapping=ggplot2::aes(colour = df$Group),legends = T,axisLabels = "show", 
upper = list(continuous = wrap("cor", method = "spearman", size = 2.5, hjust=0.7)))+ 
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"))

I am trying to add the p-value of spearman correlation to the upper panel of the figure generated (ie) appended to the Spearman correlation coefficient.我正在尝试将 spearman 相关性的 p 值添加到生成的图形的上面板(即)附加到 Spearman 相关系数。

Generally, the p-value is computed using cor.test with method passed as "Spearman"通常,p 值是使用cor.test计算的,方法传递为“Spearman”

Also aware of the StackOverFlow post discussion a query similar to this, but I need for ggpairs , for which the solution is not working.还知道 StackOverFlow 帖子讨论了一个与此类似的查询,但我需要ggpairs ,对此解决方案不起作用。 Also, the previous query is not solved yet.另外,之前的查询还没有解决。

How to add p values for Spearman correlation coefficients plotted using pairs in R 如何为使用 R 中的对绘制的 Spearman 相关系数添加 p 值

I have a feeling this is more than what you expected.. so you need to define a custom function like ggally_cor , so first we have a function that prints the correlation between 2 variables:我感觉这超出了您的预期..所以您需要定义一个自定义 function 像ggally_cor ,所以首先我们有一个 function 打印两个变量之间的相关性:

printVar = function(x,y){
      vals = cor.test(x,y,
      method="spearman")[c("estimate","p.value")]
      names(vals) = c("rho","p")
      paste(names(vals),signif(unlist(vals),2),collapse="\n")
}

Then we define a function that takes in the data for each pair, and calculates 1. overall correlation, 2. correlation by group, and pass it into a ggplot and basically only print this text:然后我们定义一个 function ,它接收每对的数据,并计算 1. 整体相关性,2. 组相关性,并将其传递到 ggplot 并基本上只打印以下文本:

my_fn <- function(data, mapping, ...){
  # takes in x and y for each panel
  xData <- eval_data_col(data, mapping$x)
  yData <- eval_data_col(data, mapping$y)
  colorData <- eval_data_col(data, mapping$colour)

# if you have colors, split according to color group and calculate cor

  byGroup =by(data.frame(xData,yData),colorData,function(i)printVar(i[,1],i[,2]))
  byGroup = data.frame(col=names(byGroup),label=as.character(byGroup))
  byGroup$x = 0.5
  byGroup$y = seq(0.8-0.3,0.2,length.out=nrow(byGroup))

#main correlation
mainCor = printVar(xData,yData)

p <- ggplot(data = data, mapping = mapping) +
annotate(x=0.5,y=0.8,label=mainCor,geom="text",size=3) +
geom_text(data=byGroup,inherit.aes=FALSE,
aes(x=x,y=y,col=col,label=label),size=3)+ 
theme_void() + ylim(c(0,1))
  p
}

Now I use mtcars, first column is a random Group:现在我使用 mtcars,第一列是一个随机组:

df  =data.frame(
Group=sample(LETTERS[1:2],nrow(mtcars),replace=TRUE),
mtcars[,1:6]
)

And plot:和 plot:

ggpairs(df[,-1],columns = 1:ncol(df[,-1]),
mapping=ggplot2::aes(colour = df$Group),
axisLabels = "show", 
upper = list(continuous = my_fn))+
theme(panel.grid.major = element_blank(), 
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")) 

在此处输入图像描述

I think for your own plot, the spacing of the text might not be optimal, but it's just a matter of tweaking my_fn .我认为对于您自己的 plot,文本的间距可能不是最佳的,但这只是调整my_fn的问题。

Works well.效果很好。 But the signif rounding off probably is not good and is not working for p-value.但是符号四舍五入可能不好,并且不适用于 p 值。 Let me explain why?让我解释一下为什么? Signif will not round off the p-value less than 0.01 and will print the value as such (with 10th power represented as e). Signif 不会对小于 0.01 的 p 值进行四舍五入,并将按原样打印该值(10 次方表示为 e)。 Suppose we use the round function, then also it is not good.假设我们用的是圆形的function,那么也不好。 Because, if the p-value is less than 0.001 it will come as 0 (with 2 places rounding off).因为,如果 p 值小于 0.001,它将为 0(四舍五入 2 位)。 Similarly, if the p-value is less than 0.01 it will come as 0 again (with 2 places rounding off).同样,如果 p 值小于 0.01,它将再次变为 0(四舍五入 2 位)。

So a mild modification of the code will take care of it.因此,对代码进行轻微修改即可解决问题。

printVar = function(x,y){
      vals = cor.test(x,y,
      method="spearman")[c("estimate","p.value")]

      vals[[1]]<-round(vals[[1]],2)   
      vals[[2]]<-ifelse(test = vals[[2]]<0.001,"<0.001",ifelse(test=vals[[2]]<0.01,"<0.01",round(vals[[2]],2)))

          names(vals) = c("rho","p")
      paste(names(vals),unlist(vals),collapse="\n")
}

And secondly, if we run the code as such it is giving an error that LAB is not found.其次,如果我们这样运行代码,则会给出找不到 LAB 的错误。

LAB is a character string required for the label. LAB 是 label 所需的字符串。

You can either give character string.您可以提供字符串。 or just pass或者只是通过

LAB=c()

Not sure if it's because you have groups or using a different version of the package (I'm using GGally_2.1.1), but the following code works perfectly for me.不确定是因为您有群组还是使用了 package 的不同版本(我使用的是 GGally_2.1.1),但以下代码非常适合我。

df %>% ggpairs(upper = list(continuous = wrap("cor", method = "spearman")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM