從 python 的列表中刪除自定義單詞

Question

我正在編寫 function 來執行自定義單詞刪除、詞干提取（獲取單詞的根形式），然后是 tf-idf。

我對 function 的輸入數據是一個列表。 如果我嘗試在單個列表上進行自定義單詞刪除，那是可行的，但是當我將它組合到 function 中時，我得到一個屬性錯誤：

AttributeError: 'list' object 沒有屬性 'lower'

這是我的代碼：

def tfidf_kw(K):    
    # Select docs in cluster K
    docs = np.array(mydata2)[km_r3.labels_==K]

    ps= PorterStemmer()
    stem_docs = []
    for doc in docs:
        keep_tokens = []
        
        for token in doc.split(' '):
            #custom stopword removal
            my_list = ['model', 'models', 'modeling', 'modelling', 'python', 
           'train','training', 'trains', 'trained','test','testing', 'tests','tested']
            
            token  = [sub_token for sub_token in list(doc) if sub_token not in my_list]

            stem_token=ps.stem(token)
            keep_tokens.append(stem_token)

        keep_tokens =' '.join(keep_tokens)
        stem_docs.append(keep_tokens)

        return(keep_tokens)

進一步的代碼適用於 tf-idf，它有效。 這是我需要幫助的地方，以了解我做錯了什么？

token  = [sub_token for sub_token in list(doc) if sub_token not in my_list]

這是完整的錯誤：

AttributeError  Traceback (most recent call last)
<ipython-input-154-528a540678b0> in <module>
     49     #return(sorted_df)
     50 
---> 51 tfidf_kw(0)

<ipython-input-154-528a540678b0> in tfidf_kw(K)
     20 
     21 
---> 22             stem_token=ps.stem(token)
     23             keep_tokens.append(stem_token)
     24 

~/opt/anaconda3/lib/python3.8/site-packages/nltk/stem/porter.py in stem(self, word)
    650 
    651     def stem(self, word):
--> 652         stem = word.lower()
    653 
    654         if self.mode == self.NLTK_EXTENSIONS and word in self.pool:

AttributeError: 'list' object has no attribute 'lower'

在第 51 行，它tfidf_kw(0) ，這就是我檢查 function 的 k=0 的地方。

Answer 1

顯然ps.stem方法需要一個單詞（一個字符串）作為參數，但您傳遞的是一個字符串列表。

由於您已經for token in doc.split(' ')使用列表理解[... for sub_token in list(doc)...]對我來說似乎沒有意義。

如果您的目標是跳過my_list中的那些標記，大概您想for token in doc.split(' ')如下所示：

for token in doc.split(' '):
    my_list = ['model', 'models', 'modeling', 'modelling', 'python', 
   'train','training', 'trains', 'trained','test','testing', 'tests','tested']

    if token in my_list:
        continue
    
    stem_token=ps.stem(token)
    keep_tokens.append(stem_token)

在這里，如果token是my_list中的單詞之一，則continue語句會跳過當前迭代的 rest 並且循環繼續下一個token 。

從 python 的列表中刪除自定義單詞

問題描述

1 個解決方案

解決方案1
1 已采納 2021-02-06 23:17:33

從 python 的列表中刪除自定義單詞

問題描述

1 個解決方案

解決方案1 1 已采納 2021-02-06 23:17:33

解決方案1
1 已采納 2021-02-06 23:17:33