简体   繁体   English

AttributeError:“ list”对象在TF-IDF中没有属性“ lower”

[英]AttributeError: 'list' object has no attribute 'lower' in TF-IDF

I'm trying to apply a TF-IDF in a Pandas column 我正在尝试在熊猫专栏中应用TF-IDF

data 数据

    all_cols
0   who is your hero and why
1   what do you do to relax
2   this is a hero
4   how many hours of sleep do you get a night
5   describe the last time you were relax

I know to use the CountVectorizer, I need to turn the column into list (and that's what I tried to do). 我知道要使用CountVectorizer,我需要将列变成列表(这就是我试图做的)。

To apply TFIDF, I could not apply a list (and I tried to convert it to string). 要应用TFIDF,我无法应用列表(并且我尝试将其转换为字符串)。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pandas as pd


df = pd.read_excel('data.xlsx')
col = df['all_cols']
corpus = col.values.tolist()

cv = CountVectorizer()
X = cv.fit_transform(corpus)

document = [' '.join(str(item)) for item in corpus]

tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(X)

feature_names=cv.get_feature_names()

tf_idf_vector=tfidf_transformer.transform(cv.transform([document]))

But I still have this error 但是我仍然有这个错误

AttributeError                            Traceback (most recent call last)
<ipython-input-239-92f296939ea7> in <module>()
     16  
---> 17 tf_idf_vector=tfidf_transformer.transform(cv.transform([documento]))

AttributeError: 'list' object has no attribute 'lower'

I'm just guessing, because I'm not using sklearn and you didn't post the full stacktrace, but the exception looks like it expects a list of strings as parameter and calls "lower()" of the string elements. 我只是在猜测,因为我没有使用sklearn并且您没有发布完整的stacktrace,但是异常看起来像它希望将字符串列表作为参数并调用字符串元素的“ lower()”一样。

But what you are doing is giving it a list of a list with strings: 但是您正在做的是给它一个包含字符串的列表的列表:

corpus = [1,2,3]
document = [' '.join(str(item)) for item in corpus]

print (document)
>>> ['1','2','3']
print ([document])
>>> [['1','2','3']]

I bet it will be fixed if you just call instead: 我敢打赌,如果您直接拨打电话,它将得到解决:

tf_idf_vector=tfidf_transformer.transform(cv.transform(document))

you can use sklearn pipeline which can simplify this. 您可以使用sklearn管道来简化此过程。

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline 

tf_idf = Pipeline([('cv',CountVectorizer()), ('tfidf_transformer',TfidfTransformer(smooth_idf=True,use_idf=True))])


tf_idf_vector  = tf_idf.fit_transform(corpus)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM