简体   繁体   English

生成具有特定列且仅在 corrplot 中具有显着值的相关矩阵

[英]Generate correlation matrix with specific columns and only with significant values in corrplot

I have a data.frame database with 14 columns.我有一个包含 14 列的 data.frame 数据库。 I split these columns into two groups: [,1:6] and [,7:14] .我将这些列分为两组: [,1:6] and [,7:14]

df<-read.csv("http://renatabrandt.github.io/EBC2015/data/varechem.csv", row.names=1)

df

I would like to calculate the correlation between these two groups of columns.我想计算这两组列之间的相关性。 For that I used this command and it worked very well:为此,我使用了这个命令并且效果很好:

#I want to correlate columns [1:6] with [7:14] only.
correlation_df<-cor(df[,1:6],
                    df[,7:14], method="spearman", use="pairwise.complete.obs")

#graph correlation especific colunms
corrplot(correlation_df,
         method="color", addCoef.col = "black")

在此处输入图像描述

However, in addition to calculating the correlation, I would like the graph to show only the significant correlations (p-value<0.05).但是,除了计算相关性之外,我希望图表仅显示显着相关性(p 值<0.05)。 I tried the following code but it didn't work because the view was wrong.我尝试了以下代码,但由于视图错误,它不起作用。

#I can get the significance level matrix
correlation_df_sig<-cor.mtest(df, conf.level = 0.95, method = "spearman")
correlation_df_sig

#Generate correlation matrix only with significant values #仅生成具有显着值的相关矩阵

plot2<-corrplot(correlation_df,
         p.mat = correlation_df_sig$p,
         insig='blank',
         addCoef.col = "black")
plot2

在此处输入图像描述

What could I do to fix this view?我能做些什么来解决这个观点?

OBS: I tried to generate a complete array without considering the [,1:6] and [,7:14] groups, but it also went wrong. OBS:我试图在不考虑[,1:6] and [,7:14]组的情况下生成一个完整的数组,但它也出错了。 Also, I don't want to calculate the correlation between columns in the same group.另外,我不想计算同一组中列之间的相关性。 Ex: column 1 with column 2, column 1 with column 3...例如:第 1 列与第 2 列,第 1 列与第 3 列...

plot1<-corrplot(cor(df, method = 'spearman', use = "pairwise.complete.obs"),
         method = 'color', 
         addCoef.col = 'black',
         p.mat = correlation_df_sig$p,
         insig='blank',
         diag = FALSE,
         number.cex = 0.5,
         type='upper'
         )
plot1

在此处输入图像描述

I would use the well established Hmisc::rcorr for the calculations.我会使用成熟的Hmisc::rcorr进行计算。 In corrplot::corrplot , subset both the corr= and the p.mat= with [1:6, 7:14] .corrplot::corrplot中,使用[1:6, 7:14]corr=p.mat=进行子集化。

c_df <- Hmisc::rcorr(cor(correlation_df), type='spearman')

library(corrplot)
corrplot(corr=c_df$r[1:6, 7:14], p.mat=c_df$P[1:6, 7:14], sig.level=0.05, 
         method='color', diag=FALSE, addCoef.col=1, type='upper', insig='blank',
         number.cex=.8)

在此处输入图像描述

This appears to correspond to the p-values.这似乎对应于 p 值。

m <- c_df$P[1:6, 7:14] < .05
m[lower.tri(m, diag=TRUE)] <- ''
as.data.frame(replace(m, lower.tri(m, diag=TRUE), ''))
#    Al    Fe    Mn   Zn    Mo Baresoil Humdepth    pH
# N     FALSE FALSE TRUE FALSE    FALSE    FALSE FALSE
# P            TRUE TRUE FALSE    FALSE    FALSE FALSE
# K                 TRUE FALSE    FALSE    FALSE  TRUE
# Ca                     FALSE     TRUE     TRUE FALSE
# Mg                               TRUE     TRUE  TRUE
# S                                        FALSE FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM