SPSS K-means & R

Question

What would be the best function/package to use in R to try and replicate the K-means clustering method used in SPSS? Here is an example of the syntax I would use in SPSS:

QUICK CLUSTER VAR1 TO VAR10       
   /MISSING=LISTWISE                  
   /CRITERIA=CLUSTER(5) MXITER(50) CONVERGE(.02)
   /METHOD=KMEANS(NOUPDATE)

Thanks!

Answer 1

In SPSS, use the /PRINT INITIAL option. This will give you the initial cluster centers, which seem to be fixed in SPSS, but random in R (see ?kmeans for parameter centers ).

If you use the printed initial cluster centers from SPSS output and the argument="Lloyd" parameter in kmeans, you should get the same results (at least it worked for me, testing with several repetitions).

Example of an SPSS-output of the initial cluster centers:

           Cluster
           Cl1  Cl2  Cl3  Cl4
Var A      1    1    4    3
Var B      4    1    4    1
Var C      1    1    1    4
Var D      1    4    4    1
Var E      1    4    1    2
Var F      1    4    4    3

This table, replicated as matrix in R, with kmeans computation:

mat <- matrix(c(1,1,4,3,4,1,4,1,1,1,1,4,1,4,4,1,1,4,1,2,1,4,4,3), nrow=4, ncol=6)
kmeans(na.omit(data.frame), centers=mat, iter.max=20, algorithm="Lloyd")

Be sure to use the same amount of maximum iterations in SPSS and R-kemans, and use Lloyd-method in R-kmeans.

However, I don't know whether it's better to have a fixed or a random choice of initial centers. I personally like the random choice, and compute a linear discriminant analysis with the found cluster groups to assess the classification accuracy, and rerun the kmeans clustering until I have a statisfying group classification.

Edit: I found this posting where the SPSS procedure of selecting initial clusters is described. Perhaps somebody knows of an R implementation?

SPSS K-means & R

Question

1 answers

solution1
0 2014-03-10 14:19:48

SPSS K-means & R

Question

1 answers

solution1 0 2014-03-10 14:19:48

solution1
0 2014-03-10 14:19:48