简体   繁体   English

图表与连续和类别变量的相关性

[英]chart.Correlation with continious and categorical variables

I want to see if there is correlation between my variables. 我想看看变量之间是否存在相关性。 This is the structure of the dataset 这是数据集的结构

'data.frame':   189 obs. of  20 variables:
 $ age            : num  24 31 32 35 36 26 31 24 35 36 ...
 $ diplM2         : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ...
 $ TimeDelcat     : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ...
 $ SeasonDel      : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ...
 $ BMIM2          : num  23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ...
 $ WgtB2          : int  3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ...
 $ sex            : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ...
 $ smoke          : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ...
 $ nRBC           : num  0.1621 0.0604 0.1935 0.0527 0.1118 ...
 $ CD4T           : num  0.1427 0.2143 0.1432 0.0686 0.0979 ...
 $ CD8T           : num  0.1574 0.1549 0.1243 0.0804 0.0782 ...
 $ NK             : num  0.02817 0 0.04368 0.00641 0.02398 ...
 $ Bcell          : num  0.1033 0.1124 0.1468 0.0551 0.0696 ...
 $ Mono           : num  0.0633 0.0641 0.0773 0.0531 0.0656 ...
 $ Gran           : num  0.428 0.442 0.329 0.716 0.6 ...
 $ chip           : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ...
 $ pos            : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ...
 $ trim1PM25ifdmv4: num  9.45 13.81 15.59 7.13 15.43 ...
 $ trim2PM25ifdmv4: num  13.27 15.53 10.69 13.56 9.27 ...
 $ trim3PM25ifdmv4: num  16.72 16.21 12.17 6.47 10.66 ...

As you can see, there are both continious and categorical variables. 如您所见,既有连续变量又有类别变量。 When I run chart.Correlation(variables, histrogram=T,method = c("pearson") ) 当我运行chart.Correlation(variables, histrogram=T,method = c("pearson") )

I get this error: 我收到此错误:

Error in pairs.default(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor,  : 
  non-numeric argument to 'pairs'

How can I fix this? 我怎样才能解决这个问题? Thank you. 谢谢。

I believe you want correlation only between numerical variables. 我相信您只希望数字变量之间具有相关性。 The below code will do this and it will output only unique correlations between the input. 下面的代码将执行此操作,并且将仅输出输入之间的唯一相关性。

library(reshape2)  
data <- data.frame(x1=rnorm(10),
            x2=rnorm(10),
            x3=rnorm(10),
            x4=c("a","b","c","d","e","f","g","h","i","j"),
            x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk"))  

data
       x1         x2         x3     x4 x5
1  -1.2169793  0.5397598  0.4981513  a ab
2  -0.7032631 -2.1262837 -1.0377371  b sp
3   0.8766831 -0.2326975 -0.1219613  c sp
4   0.3405332  2.4766225 -1.1960618  d dd
5   0.1889945  0.3444534  1.9659062  e hg
6   0.8086956  0.4654644 -1.2526696  f hj
7  -0.6850181 -1.7657241  0.5156620  g qw
8   0.8518034  0.9484547  1.4784063  h dh
9   0.5191793  1.2246566  1.3867829  i ko
10  0.4568953 -0.6881464  0.3548839  j jk

#finding correlation for all numerical values  
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))])  
#convert the correlation table to long format  
res=melt(corr)  
##keeping only one side of the correlations  
res$type=apply(res,1,function(x) 
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*"))  
res=unique(res[,c("type","value")])  

res
 type      value
x1*x1 1.00000000
x1*x2 0.44024939
x1*x3 0.04936654
x2*x2 1.00000000
x2*x3 0.08859169
x3*x3 1.00000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM