简体   繁体   中英

Extracting gap statistic info to identify K for Kmeans clustering

I was looking at the 'cluster' library which has the function 'clusGap' to extract the number of clusters for Kmeans clustering.

This is the code:

# Compute Gap statistic (http://web.stanford.edu/~hastie/Papers/gap.pdf)
computeGapStatistic() <- function(data) {
gap <<- clusGap(shift_len_avg_data, FUN = kmeans, K.max = 8, B = 3)
if (ENABLE_PLOTS) {
    plot(gap, main = "Gap statistic for the Nursing shift data")
}
print(gap)
return(gap)
}

Which gives me the following output when 'gap' is printed out:

    >   print(gap)
Clustering Gap statistic ["clusGap"].
B=3 simulated reference sets, k = 1..8
--> Number of clusters (method 'firstSEmax', SE.factor=1): 2
        logW   E.logW         gap      SE.sim
[1,] 8.702334 9.238385  0.53605067 0.007945542
[2,] 7.940133 8.544323  0.60418996 0.003790244
[3,] 7.772673 8.139836  0.36716303 0.005755805
[4,] 7.325798 7.849233  0.52343473 0.002732731
[5,] 7.233667 7.629954  0.39628748 0.003496058
[6,] 7.020220 7.439709  0.41948820 0.006451708
[7,] 6.707678 7.285907  0.57822872 0.002810682
[8,] 7.166932 7.150724 -0.01620749 0.004274151

and this is how the plot look like:

在此处输入图片说明

Question:

How do i extract the number of clusters from the 'gap' variable? 'gap' seems to be a list. From the above description it seems to have found 2 clusters.

I figured this out on my own. This is what i used: with(gap,maxSE(Tab[,"gap"],Tab[,"SE.sim"]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM