簡體   English   中英

遞歸錯誤:超過最大遞歸深度

[英]Recursion Error: Maximum Recursion depth exceeded

from __future__ import print_function
import os, codecs, nltk.stem

english_stemmer = nltk.stem.SnowballStemmer('english')
for root, dirs, files in os.walk("/Users/Documents/corpus/source-document/test1"):
        for file in files:
            if file.endswith(".txt"):
                posts = codecs.open(os.path.join(root,file),"r", "utf-8-sig")
from sklearn.feature_extraction.text import CountVectorizer
class StemmedCountVectorizer(CountVectorizer):
    def build_analyzer(self):
        analyzer = super(StemmedCountVectorizer, self.build_analyzer())
        return lambda doc: (english_stemmer.stem(w) for w in  analyzer(doc))

vectorizer = StemmedCountVectorizer(min_df = 1, stop_words = 'english')
X_train = vectorizer.fit_transform(posts)
num_samples, num_features = X_train.shape
print("#samples: %d, #features: %d" % (num_samples, num_features))     #samples: 5, #features: 25
print(vectorizer.get_feature_names())

當我為目錄中包含的所有文本文件運行以上代碼時,它將引發以下錯誤:RecursionError:超過最大遞歸深度。

我試圖用sys.setrecursionlimit解決問題,但徒勞無功。 當我提供較大的值(如20000)時,發生內核崩潰錯誤。

您的錯誤是在analyzer = super(StemmedCountVectorizer, self.build_analyzer()) ,您在超級調用之前調用了build_analyzer函數,這會導致無限遞歸循環。 將其更改為analyzer = super(StemmedCountVectorizer, self).build_analyzer()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM