简体   繁体   English

缺少位置参数:“ raw_documents”,它指的是什么?

[英]Missing a positional argument: 'raw_documents', what does it refer to?

I am trying to define vectorizer parameters for use in a model, but python keeps saying that I am missing a parameter. 我正在尝试定义要在模型中使用的矢量化器参数,但是python一直在说我缺少参数。 Reviews is a list of restaurant reviews I have web scraped from yelp. 评论是我从yelp抓取到的餐厅评论的列表。 The problem is occurring with .fit_transform(), I have the following: .fit_transform()出现了问题,我有以下内容:

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer(max_df=0.8, max_features=200000,
                                 min_df=0.2, stop_words='english',
                                 use_idf=True, tokenizer=tokenize_and_stem, ngram_range=(1,3))
%time tfidf_matrix = TfidfVectorizer.fit_transform(Reviews) 
print(tfidf_matrix)

You created tfidf_vectorizer object but it is not used. 您创建了tfidf_vectorizer对象,但未使用它。 You should use tfidf_vectorizer.fit_transform(Reviews) . 您应该使用tfidf_vectorizer.fit_transform(Reviews)

When you use .fit_transform, you need to pass a list, dict or tuple to iterate over values. 使用.fit_transform时,需要传递列表,字典或元组以遍历值。

Example: 例:

list = ["a" , "b" , "c"] #Here is your data
TfidfVectorizer.fit_transform(list)

Is important that you dont have null, or none values on your set of data. 重要的是,您的数据集上不能有null值或没有值。

If you have only one value you can also do this and it works 如果只有一个值,您也可以做到这一点,并且可以正常工作

list = ["Only Value"]
TfidfVectorizer.fit_transform(list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM