简体   繁体   English

R中的聚类分析:如何从pvclust获得确定性结果?

[英]Cluster analysis in R: How can I get deterministic results from pvclust?

pvclust is great for cluster analysis in R. However, when running it as part of a batch operation, it is annoying to get different results for the same data. pvclust非常适合R中的群集分析。但是,将其作为批处理操作的一部分运行时,对于相同的数据获得不同的结果很烦人。 Obviously, there are many "correct" clusterings of the same data, and it seems that pvclust uses some randomness to determine the clusters of a specific run. 显然,同一数据有许多“正确”的聚类,而且pvclust似乎使用某种随机性来确定特定运行的聚类。 But is there any way to get deterministic results? 但是有什么方法可以得到确定性的结果吗?

I want to be able to present a minimal, repeatable analysis package: the data plus an R script, and a separate written document that contains my interpretations of the clustering. 我希望能够提出一个最小的,可重复的分析包:数据和R脚本,以及一个单独的书面文档,其中包含我对聚类的解释。 It is then possible for others to add to the analysis, eg by changing the aesthetic appearance of plots. 这样,其他人就有可能添加到分析中,例如通过更改图的美学外观。 Now, the interpretations will always be out of sync with what someone else gets when they run the script containing pvclust . 现在,这些解释将始终与其他人运行包含pvclust的脚本时得到的信息不同步。

Not only for cluster analysis, but when there is randomness involved, you can fix the random number generator so you always get the same results. 不仅用于聚类分析,而且在涉及随机性时,您都可以修复随机数生成器,以便始终获得相同的结果。

Try: 尝试:

set.seed(seed=123)
# your code here

The seed can be any integer, or something that can be converted to integer. seed可以是任何整数,也可以是可以转换为整数的值。 And that's all. 就这样。

i've only used k means. 我只用过k均值。 There I had to set the number of 'runs' or iterations to a higher value than default to get the same custers at consecutive runs. 在这里,我必须将“运行”或迭代的数量设置为比默认值更高的值,以在连续运行中获得相同的提示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM