I'm trying to use TFIDF to get features from titles of text articles. I'm doing the following:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus_title = result_df['_title'].tolist()
tfidf_transformer_title = TfidfVectorizer(min_df = 1, ngram_range = (1,1), use_idf = True, stop_words='english')
tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)
tfidf_df_title
However, I get an error at this line:
----> 4 tfidf_df_title = tfidf_transformer_title.fit_transform(corpus_title)
The error is:
205
206 if self.lowercase:
--> 207 return lambda x: strip_accents(x.lower())
208 else:
209 return strip_accents
AttributeError: 'NoneType' object has no attribute 'lower'
I'm not sure how it's possible to get this error. I checked the docs and it looks like TfidfVectorizer
uses UTF-8 as its default encoding.
Any idea how to fix?
Thanks!
尝试这个:
tfidf_transformer_title = TfidfVectorizer(min_df = 1,lowercase = False, ngram_range = (1,1), use_idf = True, stop_words='english')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.