R and SAS : different results for clustering analysis

Question

I'm doing a cluster analysis with R and SAS and I have results which are really different.

I know that the results are random, so a little difference is normal, but the difference is huge.

I perform a test with the famous CARS dataset from SAS.

With R, I do that :

kmeans(CARS[,c(8,10)],5)

Result : (between_SS / total_SS = 93.2 %)

With SAS, I do that :

proc fastclus data=sashelp.cars maxclusters=5 ; var EngineSize 
Horsepower ; run;

Result : Approximate Expected Over-All R-Squared = 0.96079

The difference is smaller, but there is still a difference. I perform the test few times, and the results are still the same.

Where does this difference come from ?

Answer 1

Pretty sure from the documentation:

that these they rely on different algorithms. SAS documentation vaguely describes a method of "nearest centroid sorting". I don't know anything about this substantively, but perhaps look into other clustering functions (like hclust ) or other packages to find something comparable.

R and SAS : different results for clustering analysis

Question

1 answers

solution1
2 2013-06-05 10:30:49

R and SAS : different results for clustering analysis

Question

1 answers

solution1 2 2013-06-05 10:30:49

solution1
2 2013-06-05 10:30:49