简体   繁体   中英

Error in example code from the scikit-learn documentation for the Naive Bayes classifier?

I'm a new Python user and have been running a Naive Bayes classifier model using the scikit-learn module. Is the following example code on the scikit learn Naïve Bayes documentation page correct?

from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d"

Shouldn't the gnb.fit() function instead read:

y_pred = gnb.fit(iris.data.drop(columns=['target']), iris.target).predict(iris.data)

That is, the response variable needs to be manually removed from the predictor dataset. I was getting unreasonably high accuracy metrics for my model when a colleague pointed out that the code I had cribbed from the scikit-learn documentation page is wrong.

iris.data is not a dataframe, it's just a (150,4) numpy array with the 4 features.

iris.target is another numpy array with just the target class.

Not sure how you could call drop on the array (I just checked that I have an array and not a pd df, which makes sense, sklearn doesn't depend on pandas).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM