SciKit-Learn: Trouble with TfidfVectorizer

Question

I'm trying to use TFIDF to get features from titles of text articles. I'm doing the following:

from sklearn.feature_extraction.text import TfidfVectorizer
corpus_title = result_df['_title'].tolist()
tfidf_transformer_title = TfidfVectorizer(min_df = 1, ngram_range = (1,1), use_idf = True, stop_words='english')
tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)
tfidf_df_title

However, I get an error at this line:

----> 4 tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)

The error is:

    205 
    206         if self.lowercase:
--> 207             return lambda x: strip_accents(x.lower())
    208         else:
    209             return strip_accents

AttributeError: 'NoneType' object has no attribute 'lower'

I'm not sure how it's possible to get this error. I checked the docs and it looks like TfidfVectorizer uses UTF-8 as its default encoding.

Any idea how to fix?

Thanks!

Answer 1

尝试这个：

tfidf_transformer_title = TfidfVectorizer(min_df = 1,lowercase = False, ngram_range = (1,1), use_idf = True, stop_words='english')

SciKit-Learn: Trouble with TfidfVectorizer

Question

1 answers

solution1
1 2017-07-07 13:39:34

SciKit-Learn: Trouble with TfidfVectorizer

Question

1 answers

solution1 1 2017-07-07 13:39:34

solution1
1 2017-07-07 13:39:34