简体   繁体   English

为什么在 python 2.7 中比 3.6 有更好的性能(剪影分数)?

[英]why a better performance (silhouette score) in python 2.7 than 3.6?

I read a lot on SoF about the difference in speed between Python 2.7 and 3.6.我在 SoF 上阅读了很多关于 Python 2.7 和 3.6 之间速度差异的内容。 but my question is more about performance between the two versions.但我的问题更多是关于两个版本之间的性能。

I used for document clustering: TF-IDF + KMeans and score silhouette to evaluate the homogeneity of my clusters.我用于文档聚类:TF-IDF + KMeans 和分数轮廓来评估我的集群的同质性。

By switching from Python 3.6 to Python 2.7, my silhouette score has increased by +0.20!通过从 Python 3.6 切换到 Python 2.7,我的轮廓分数增加了+0.20!

**Would someone have an explanation? **有人能解释一下吗? ** Thanks! ** 谢谢!

code :代码 :

tfidf = TfidfVectorizer(
    stop_words=my_stopwords_str, 
    max_df=0.95, 
    min_df=5, 
    token_pattern=r'\w{3,}',
    max_features=20)

tfidf.fit(data_final.all_text)
data_vect = tfidf.transform(data_final.all_text)

num_clusters = 15

kmeans = KMeans(n_clusters=num_clusters, init='k-means++', 
max_iter=300).fit(data_vect_lsa)
kmeans_predict = KMeans(n_clusters=num_clusters, init='k-means++', max_iter=300).fit_predict(data_vect_lsa)


silhouette_score(data_vect, labels = kmeans_predict, metric='euclidean')

The output for Python 2.7 is : Python 2.7 的输出是:

0.58234789374593758

The output for Python 3.6 is : Python 3.6 的输出是:

0.37524101598378656    

Try again.再试一次。 A single sample is not enough.单个样本是不够的。

K-means begins with a random setting, and may find a local optimum only. K-means 从一个随机设置开始,可能只能找到一个局部最优。

It's fairly common to see different results when running it multiple times.多次运行时看到不同的结果是很常见的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM