What would be the best function/package to use in R to try and replicate the K-means clustering method used in SPSS? Here is an example of the syntax I would use in SPSS:
QUICK CLUSTER VAR1 TO VAR10
/MISSING=LISTWISE
/CRITERIA=CLUSTER(5) MXITER(50) CONVERGE(.02)
/METHOD=KMEANS(NOUPDATE)
Thanks!
In SPSS, use the /PRINT INITIAL
option. This will give you the initial cluster centers, which seem to be fixed in SPSS, but random in R (see ?kmeans
for parameter centers
).
If you use the printed initial cluster centers from SPSS output and the argument="Lloyd"
parameter in kmeans, you should get the same results (at least it worked for me, testing with several repetitions).
Example of an SPSS-output of the initial cluster centers:
Cluster
Cl1 Cl2 Cl3 Cl4
Var A 1 1 4 3
Var B 4 1 4 1
Var C 1 1 1 4
Var D 1 4 4 1
Var E 1 4 1 2
Var F 1 4 4 3
This table, replicated as matrix in R, with kmeans computation:
mat <- matrix(c(1,1,4,3,4,1,4,1,1,1,1,4,1,4,4,1,1,4,1,2,1,4,4,3), nrow=4, ncol=6)
kmeans(na.omit(data.frame), centers=mat, iter.max=20, algorithm="Lloyd")
Be sure to use the same amount of maximum iterations in SPSS and R-kemans, and use Lloyd-method in R-kmeans.
However, I don't know whether it's better to have a fixed or a random choice of initial centers. I personally like the random choice, and compute a linear discriminant analysis with the found cluster groups to assess the classification accuracy, and rerun the kmeans clustering until I have a statisfying group classification.
Edit: I found this posting where the SPSS procedure of selecting initial clusters is described. Perhaps somebody knows of an R implementation?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.