繁体   English   中英

python 双 for 循环不提供预期结果

[英]python dual for loops does not provide the expected results

我是 python 的新手。 我正在尝试运行以下代码,但结果与预期不符:

c = [0,1,2,3,4]
clus = [c0,c1,c2,c3,c4] #each element in the list is a dataframe
for i in c:
    Movie = data.Title[data.labels == i]
    for j in clus:
        vect = CountVectorizer(stop_words='english',max_features=5)
        cv_fit = vect.fit_transform(j).toarray()
        key_features = vect.get_feature_names()
    print("Cluster",i,"details:")
    print('-'*80)
    print("Key Features:", key_features)
    print("Movies in the cluster:")
    print(Movie)
    print("Movies in the cluster:",i)
    print(' ')
    print(' ')  

预期 Output:


Cluster 0 details:
--------------------
Key features: ['water', 'on the', 'her', 'while', 'she']
Movies in this cluster:
One Flew Over the Cuckoo's Nest, The Sound of Music, Star Wars, Chinatown, The Bridge on the River Kwai, Apocalypse Now, Jaws, The Good, the Bad and the Ugly, Butch Cassidy and the Sundance Kid
========================================
Cluster 1 details:
--------------------
Key features: ['her', 'she', 'about', 'to her', 'that she']
Movies in this cluster:
Gone with the Wind, The Wizard of Oz, Titanic, Psycho, Sunset Blvd., Vertigo
========================================

and so on .... 

但我当前的 Output 是:

Cluster 0 details:
--------------------------------------------------------------------------------
Key Features: ['water', 'on the', 'her', 'while', 'she']
Movies in the cluster:
0     One Flew Over the Cuckoo's Nest
1     The Sound of Music
3     Star Wars
4     Chinatown
6    The Bridge on the River Kwai
93    Apocalypse Now
94    Jaws
95    The Good
97    the Bad and the Ugly
99    Butch Cassidy and the Sundance Kid
Name: Title, Length: 67, dtype: object
 

Cluster 1 details:
--------------------------------------------------------------------------------
Key Features: ['water', 'on the', 'her', 'while', 'she']
Movies in the cluster:
7     Gone with the Wind
56    The Wizard of Oz
85    Titanic
89    Psycho
92    Sunset Blvd
100   Vertigo
Name: Title, dtype: object
 
and so on ...

所有集群的关键特性保持不变。 我应该在我的代码中进行哪些调整,以便我的关键功能也会针对不同的集群进行更改。

data.head(2) looks like the below:

       Title       |          Synopsis                |    Labels |
     --------------------------------------------------------------
0    |The Godfather|Guests are gathered last summer...|       0   |
1    |Raging Bull  |The film opens in 1964 ....       |       1
    

CountVectorizer 是我们在自然语言处理 (NLP) 中使用的一种算法

from sklearn.feature_extraction.text import CountVectorizer

我需要像 (0,1,2,3,4) 这样的集群编号,然后是每个集群中的关键特征。 每个集群都是一个 dataframe,它是“数据”的一个子集。 c0 取自标签为“0”的数据,类似地,它对所有 c0、c1、c2、c3、c4 进行。

每个集群都有一个独特的关键特征,因为每个集群的输入都不同。 但我的代码为所有不正确的集群打印了 c0 关键特性。

代码的第 11 行有一些问题,因为它打印了与 cluster0 相同的关键特征结果,而不是打印 cluster1 的结果

不需要嵌套循环。

c = [0,1,2,3,4]
clus = [c0,c1,c2,c3,c4] #each element in the list is a dataframe
for i in c:
    Movie = data.Title[data.labels == i]
    cluster = clus[i]
    vect = CountVectorizer(stop_words='english',max_features=5)
    cv_fit = vect.fit_transform(cluster).toarray()
    key_features = vect.get_feature_names()
    print("Cluster",i,"details:")
    print('-'*80)
    print("Key Features:", key_features)
    print("Movies in the cluster:")
    print(Movie)
    print("Movies in the cluster:",i)
    print(' ')
    print(' ')  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM