简体   繁体   中英

SciKit-Learn: Trouble with TfidfVectorizer

I'm trying to use TFIDF to get features from titles of text articles. I'm doing the following:

from sklearn.feature_extraction.text import TfidfVectorizer
corpus_title = result_df['_title'].tolist()
tfidf_transformer_title = TfidfVectorizer(min_df = 1, ngram_range = (1,1), use_idf = True, stop_words='english')
tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)
tfidf_df_title

However, I get an error at this line:

----> 4 tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)

The error is:

    205 
    206         if self.lowercase:
--> 207             return lambda x: strip_accents(x.lower())
    208         else:
    209             return strip_accents

AttributeError: 'NoneType' object has no attribute 'lower'

I'm not sure how it's possible to get this error. I checked the docs and it looks like TfidfVectorizer uses UTF-8 as its default encoding.

Any idea how to fix?

Thanks!

尝试这个:

tfidf_transformer_title = TfidfVectorizer(min_df = 1,lowercase = False, ngram_range = (1,1), use_idf = True, stop_words='english')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM