TfIdf矩陣為BernoulliNB返回錯誤的功能數量

Question

使用python lib sklearn，我嘗試從trainingsets中提取特征，並使用此數據擬合BernoulliNB分類器。

在分類器經過培訓之后，我想預測（分類）一些新的測試數據。 不幸的是我得到這個錯誤：

Traceback (most recent call last):
File "sentiment_analysis.py", line 45, in <module> main()
File "sentiment_analysis.py", line 41, in main
  prediction = classifier.predict(tfidf_data)
File "\Python27\lib\site-packages\sklearn\naive_bayes.py", line 64, in predict
  jll = self._joint_log_likelihood(X)
File "\Python27\lib\site-packages\sklearn\naive_bayes.py", line 724, in _joint_log_likelihood
  % (n_features, n_features_X))
ValueError: Expected input with 4773 features, got 13006 instead

這是我的代碼：

#Train the Classifier
data,target = load_file('validation/validation_set_5.csv')
tf_idf = preprocess(data)
classifier = BernoulliNB().fit(tf_idf, target)

#Predict test data
count_vectorizer = CountVectorizer(binary='true')
test = count_vectorizer.fit_transform(test)
tfidf_data = TfidfTransformer(use_idf=False).fit_transform(test)
prediction = classifier.predict(tfidf_data)

Answer 1

這就是為什么您會遇到此錯誤：

test = count_vectorizer.fit_transform(test)
tfidf_data = TfidfTransformer(use_idf=False).fit_transform(test)

在這里，您應該只使用安裝在火車上的舊變壓器（CountVectorizer和TfidfTransformer是您的變壓器）。

fit_transform

意味着您可以將這些轉換器安裝在新的轉換器上，失去所有有關舊版擬合的信息，然后使用該轉換器轉換“測試”（從新樣本中學習，並具有不同的功能集）。 因此，它返回轉換為新功能集的測試集，該新功能集與訓練集中使用的舊功能集不兼容。 要解決此問題，您應該在適合訓練集的舊變壓器上使用transform（not fit_transform）方法。

您應該編寫如下內容：

test = old_count_vectorizer.transform(test)
tfidf_data = old_tfidf_transformer.transform(test)

TfIdf矩陣為BernoulliNB返回錯誤的功能數量

問題描述

1 個解決方案

解決方案1
1 已采納 2015-10-05 10:42:23

TfIdf矩陣為BernoulliNB返回錯誤的功能數量

問題描述

1 個解決方案

解決方案1 1 已采納 2015-10-05 10:42:23

解決方案1
1 已采納 2015-10-05 10:42:23