为什么在 python 2.7 中比 3.6 有更好的性能（剪影分数）？

Question

I read a lot on SoF about the difference in speed between Python 2.7 and 3.6.我在 SoF 上阅读了很多关于 Python 2.7 和 3.6 之间速度差异的内容。 but my question is more about performance between the two versions.但我的问题更多是关于两个版本之间的性能。

I used for document clustering: TF-IDF + KMeans and score silhouette to evaluate the homogeneity of my clusters.我用于文档聚类：TF-IDF + KMeans 和分数轮廓来评估我的集群的同质性。

By switching from Python 3.6 to Python 2.7, my silhouette score has increased by +0.20!通过从 Python 3.6 切换到 Python 2.7，我的轮廓分数增加了+0.20！

**Would someone have an explanation? **有人能解释一下吗？ ** Thanks! ** 谢谢！

code :代码：

tfidf = TfidfVectorizer(
    stop_words=my_stopwords_str, 
    max_df=0.95, 
    min_df=5, 
    token_pattern=r'\w{3,}',
    max_features=20)

tfidf.fit(data_final.all_text)
data_vect = tfidf.transform(data_final.all_text)

num_clusters = 15

kmeans = KMeans(n_clusters=num_clusters, init='k-means++', 
max_iter=300).fit(data_vect_lsa)
kmeans_predict = KMeans(n_clusters=num_clusters, init='k-means++', max_iter=300).fit_predict(data_vect_lsa)


silhouette_score(data_vect, labels = kmeans_predict, metric='euclidean')

The output for Python 2.7 is : Python 2.7 的输出是：

0.58234789374593758

The output for Python 3.6 is : Python 3.6 的输出是：

0.37524101598378656

Answer 1

Try again.再试一次。 A single sample is not enough.单个样本是不够的。

K-means begins with a random setting, and may find a local optimum only. K-means 从一个随机设置开始，可能只能找到一个局部最优。

It's fairly common to see different results when running it multiple times.多次运行时看到不同的结果是很常见的。

为什么在 python 2.7 中比 3.6 有更好的性能（剪影分数）？

问题描述

1 个解决方案

解决方案1
1 2018-11-12 18:24:11

为什么在 python 2.7 中比 3.6 有更好的性能（剪影分数）？

问题描述

1 个解决方案

解决方案1 1 2018-11-12 18:24:11

解决方案1
1 2018-11-12 18:24:11