简体   繁体   中英

R and SAS : different results for clustering analysis

I'm doing a cluster analysis with R and SAS and I have results which are really different.

I know that the results are random, so a little difference is normal, but the difference is huge.

I perform a test with the famous CARS dataset from SAS.

With R, I do that :

kmeans(CARS[,c(8,10)],5)

Result : (between_SS / total_SS = 93.2 %)

With SAS, I do that :

proc fastclus data=sashelp.cars maxclusters=5 ; var EngineSize 
Horsepower ; run;

Result : Approximate Expected Over-All R-Squared = 0.96079

The difference is smaller, but there is still a difference. I perform the test few times, and the results are still the same.

Where does this difference come from ?

Pretty sure from the documentation:

that these they rely on different algorithms. SAS documentation vaguely describes a method of "nearest centroid sorting". I don't know anything about this substantively, but perhaps look into other clustering functions (like hclust ) or other packages to find something comparable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM