简体   繁体   中英

Hierarchical Plot of Columns of Strings

I have a dataframe of 10 rows by 7 columns. Each row, column is a string.

I was wondering if there was a package that would do hierarchical clustering/coloring on the columns?

For example suppose it was three columns by five rows as such:

V1 V2 V3 V4 V5
a  a  c  d  e 
b  b  d  f  b
c  c  e  a  c
d  d  g  b  d
e  f  h  c  e

Is there a package that would show V1/V2 as highly correlated and plot it? Let's say the correlation is strictly if the pairwise elements match.

> d<-data.frame(V1=c('a','b','c','d','e'),V2=c('a','b','c','d','f'),V3=c('c','d','e','g','h'),V4=c('d','f','a','b','c'),V5=c('e','b','c','d','e'), stringsAsFactors=F)
> res<-outer(1:5,1:5, FUN=Vectorize(function(i,j) sum(d[,i]==d[,j]) ))
> res
     [,1] [,2] [,3] [,4] [,5]
[1,]    5    4    0    0    4
[2,]    4    5    0    0    3
[3,]    0    0    5    0    0
[4,]    0    0    0    5    0
[5,]    4    3    0    0    5

> library(corrplot)
> corrplot(res/5)

在此处输入图片说明

see https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html for more plotting options including clustering. Note: V1/V2 and V1/V5 are both equally "highly correlated" from your example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM