繁体   English   中英

使用scikit-learn进行有监督的机器学习

[英]Supervised machine learning with scikit-learn

这是我第一次进行有监督的机器学习。 这是一个相当高级的话题(至少对我而言),而且我不确定要指出什么问题,因为我不确定出什么问题了。

# Create a training list and test list (looks something like this):
train = [('this hostel was nice',2),('i hate this hostel',1)]
test = [('had a wonderful time',2),('terrible experience',1)]

# Loading modules
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics

# Use a BOW representation of the reviews
vectorizer = CountVectorizer(stop_words='english') 
train_features = vectorizer.fit_transform([r[0] for r in train]) 
test_features = vectorizer.fit([r[0] for r in test])

# Fit a naive bayes model to the training data
nb = MultinomialNB()
nb.fit(train_features, [r[1] for r in train])

# Use the classifier to predict classification of test dataset
predictions = nb.predict(test_features)
actual=[r[1] for r in test]

在这里我得到错误:

float() argument must be a string or a number, not 'CountVectorizer'

这使我感到困惑,因为我在评论中获得的原始评分是:

type(ratings_new[0])
int

你应该换线

test_features = vectorizer.fit([r[0] for r in test])

至:

test_features = vectorizer.transform([r[0] for r in test])

原因是您已经使用了训练数据来拟合矢量化器,因此您无需再次将其拟合到测试数据中。 相反,您需要对其进行转换。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM