I'm a new Python user and have been running a Naive Bayes classifier model using the scikit-learn module. Is the following example code on the scikit learn Naïve Bayes documentation page correct?
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d"
Shouldn't the gnb.fit()
function instead read:
y_pred = gnb.fit(iris.data.drop(columns=['target']), iris.target).predict(iris.data)
That is, the response variable needs to be manually removed from the predictor dataset. I was getting unreasonably high accuracy metrics for my model when a colleague pointed out that the code I had cribbed from the scikit-learn documentation page is wrong.
iris.data
is not a dataframe, it's just a (150,4) numpy array with the 4 features.
iris.target
is another numpy array with just the target class.
Not sure how you could call drop
on the array (I just checked that I have an array and not a pd df, which makes sense, sklearn doesn't depend on pandas).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.