简体   繁体   English

多项式朴素贝叶斯-Sklearn

[英]Multinomial naive bayes - sklearn

import numpy as np
from sklearn.naive_bayes import MultinomialNB
X = np.array([[0.25, 0.73], [0.12, 0.42], [0.53, 0.92], [0.11, 0.32]])
y = np.array([0, 0, 0, 1])
mnb = MultinomialNB()
mnb.fit(X, y)
mnb.predict([[0.11, 0.32]])

--> it predicts 0 ->预测为0

Shouldn't it predict 1? 它不应该预测1吗?

Not necessarily. 不必要。 You can't assume that just because a model has seen an observation it will predict the corresponding label correctly. 您不能仅仅因为模型看到了观察就可以假设它会正确预测相应的标签。 This is especially true in a high bias algorithm like Naive Bayes . 像Naive Bayes这样的高偏差算法中尤其如此。 High bias models tend to oversimplify the relationship between your X and y , and what you're seeing here is a product of that. 高偏差模型往往会过分简化Xy之间的关系,而您在此处看到的就是该结果。 On top of that, you only fit 4 samples, which is far too few for a model to learn a robust relationship. 最重要的是,您只适合4个样本,对于学习稳健的关系的模型而言,这太少了。

If you're curious how exactly the model is creating these predictions, Multinomial Naive Bayes learns the joint log likelihoods of each class. 如果您想知道模型是如何精确地创建这些预测的,那么多项式朴素贝叶斯会学习每个类的联合对数似然率。 You can actually compute those likelihoods using your fitted model: 您实际上可以使用拟合模型来计算这些可能性:

>>> jll = mnb._joint_log_likelihood(X)
>>> jll
array([[-0.87974542, -2.02766662],
       [-0.60540174, -1.73662711],
       [-1.24051492, -2.36300468],
       [-0.54761186, -1.66776584]])

From there, the predict stage takes the argmax of the classes, which is where the class label prediction comes from: 从那里开始, predict阶段获取类的argmax ,这是类标签预测的来源:

>>> mnb.classes_[np.argmax(jll, axis=1)]
array([0, 0, 0, 0])

You can see that as it currently stands, the model will predict 0 for all of the samples you've provided. 您可以看到,按照目前的状态,该模型将为您提供的所有样本预测0

It depends . 这要看情况 Here, you are using only one sample that belongs to class 1 during the fitting/training. 在这里,您在拟合/训练过程中仅使用一个 属于1类的样本。 Also, you have only 4 features for each sample and only 4 samples so again, the training will be poor . 此外, 每个样本只有4个特征,只有4个样本,因此, 训练仍然很糟糕

import numpy as np
from sklearn.naive_bayes import MultinomialNB
X = np.array([[0.25, 0.73], [0.12, 0.42], [0.53, 0.92], [0.11, 0.32]])
y = np.array([0, 0, 0, 1])
mnb = MultinomialNB()
mnb.fit(X, y)

mnb.predict([[0.11, 0.32]])
array([0])
mnb.predict([[0.25, 0.73]])
array([0])

The model learns the rule and can successfully predict class 0 but not class 1 . 该模型将学习规则,并且可以成功预测0类,不能 成功预测 1类 This is also known as the trade-off between specificity and sensitivity. 这也称为特异性和敏感性之间的权衡。 We also refer to that by saying that the model can not generalize the rule. 我们也通过说模型不能推广规则来提及这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM