I have dataframe of 10 binary variables, looked like this:
V1 V2 V3...
0 1 1
1 1 0
1 0 1
0 0 1
I need to get the correlation matrix then I can do factor analysis.
psych::corr.test
can calculate calculate the correlation matrix,but has only person
, spearman
, kendall
methods,not used for binary data.
Then, how to calculate the correlation matrix of this dataframe?
Correl methods are suitable for continuous data. https://www.quora.com/Is-it-possible-to-calculate-correlations-between-binary-variables
Can u you try non parametric methods try http://www.cedar.buffalo.edu/papers/articles/CVPRIP03_propbina.pdf
You can still achieve factor analysis, calculate % match and remove variable matching >x%. This way you can remove the dimension of the data.
# create data
m <- matrix(sample(x = 0:1,size = 200,replace = T),ncol = 10)
colnames(m) <- LETTERS[1:10]
m
# create cor matrix
res <- data.frame()
for(i in seq(ncol(m))){
z <- m[,i]
z <- apply(m,2,function(x){sum(x==z)/length(z)})
res <- rbind(res,z)
}
colnames(res) <- colnames(m)
rownames(res) <- colnames(m)
res <- as.matrix(res)
res
You can use hierarchical clustering on columns
hclus(x)
or even better you can choose a clustering method from "ward.D", "ward.D2", "single", "complete"... https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/hclust
Another solution will be to visualize your binary matrix as a heatmap, a similar variable with common features
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.