简体   繁体   中英

How to calculate correlation matrix between binary variables in r?

I have dataframe of 10 binary variables, looked like this:

V1 V2 V3...
0  1  1
1  1  0
1  0  1
0  0  1  

I need to get the correlation matrix then I can do factor analysis.
psych::corr.test can calculate calculate the correlation matrix,but has only person , spearman , kendall methods,not used for binary data.
Then, how to calculate the correlation matrix of this dataframe?

Correl methods are suitable for continuous data. https://www.quora.com/Is-it-possible-to-calculate-correlations-between-binary-variables

Can u you try non parametric methods try http://www.cedar.buffalo.edu/papers/articles/CVPRIP03_propbina.pdf

You can still achieve factor analysis, calculate % match and remove variable matching >x%. This way you can remove the dimension of the data.

# create data
m <- matrix(sample(x = 0:1,size = 200,replace = T),ncol = 10)
colnames(m) <- LETTERS[1:10]
m
# create cor matrix
res <- data.frame()
for(i in seq(ncol(m))){
  z <- m[,i]
  z <- apply(m,2,function(x){sum(x==z)/length(z)})
  res <- rbind(res,z)
}
colnames(res) <- colnames(m)
rownames(res) <- colnames(m)
res <- as.matrix(res)
res

You can use hierarchical clustering on columns

hclus(x)

or even better you can choose a clustering method from "ward.D", "ward.D2", "single", "complete"... https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/hclust

Another solution will be to visualize your binary matrix as a heatmap, a similar variable with common features

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM