简体   繁体   English

R 中的 AODE 机器学习

[英]AODE Machine Learning in R

I wanted to know if really AODE may be better than Naive Bayes in its way, as the description says:我想知道 AODE 是否真的比朴素贝叶斯更好,正如描述所说:

https://cran.r-project.org/web/packages/AnDE/AnDE.pdf https://cran.r-project.org/web/packages/AnDE/AnDE.pdf

--> "AODE achieves highly accurate classification by averaging over all of a small space." --> “AODE 通过对所有小空间进行平均来实现高度准确的分类。”

https://www.quora.com/What-is-the-difference-between-a-Naive-Bayes-classifier-and-AODE https://www.quora.com/What-is-the-difference-between-a-Naive-Bayes-classifier-and-AODE

--> "AODE is a weird way of relaxing naive bayes' independence assumptions. It is no longer a generative model, but it relaxes the independence assumptions in a slightly different (and less principled) way than logistic regression does. It replaces the convex optimization problem used in training a logistic regression classifier by a quadratic (on the number of features) dependency on both training and test times." --> "AODE 是一种放松朴素贝叶斯独立性假设的奇怪方法。它不再是一个生成模型,但它以一种与逻辑回归略有不同(且原则性较低)的方式放松了独立性假设。它取代了凸函数用于训练逻辑回归分类器的优化问题,通过对训练和测试时间的二次(基于特征数量)依赖性。”

But when I experiment it, I found that the predict results seems off, I implemented it with these codes:但是当我试验它时,我发现预测结果似乎不对,我用这些代码实现了它:

library(gmodels)
library(AnDE)
AODE_Model = aode(iris)
predict_aode = predict(AODE_Model, iris)
CrossTable(as.numeric(iris$Species), predict_aode) 

矩阵表

Can anyone explain to me about this?任何人都可以向我解释这一点吗? or are there any good practical solutions to implement AODE?或者有什么好的实用解决方案来实施AODE? thankyou in advance先感谢您

If you check out the vignette for the function:如果您查看该功能的小插图:

train: data.frame : training data.训练:data.frame:训练数据。 It should be a data frame.它应该是一个数据框。 AODE works only discretized data. AODE 仅适用于离散数据。 It would be better to discreetize the data frame before passing it to this function.However, aode discretizes the data if not done before hand.最好在将数据帧传递给此函数之前对其进行离散化。但是,如果不事先进行,aode 会将数据离散化。 It uses an R package called discretization for the purpose.为此,它使用了一个称为离散化的 R 包。 It uses the well known MDL discretization technique.(It might fail sometimes)它使用众所周知的 MDL 离散化技术。(有时可能会失败)

By default, the discretization function from arules cuts it into 3, which may not be enough for iris.默认情况下,arules 的离散化函数将其切成 3,这对于 iris 来说可能不够。 So I first reproduce the result you have with the discretization by arules:所以我首先重现你通过arules离散化的结果:

library(arules)
library(gmodels)
library(AnDE)
set.seed(111)
trn = sample(1:nrow(indata),100)
test = setdiff(1:nrow(indata),trn)

indata <- data.frame(lapply(iris[,1:4],discretize,breaks=3),Species=iris$Species)
AODE_Model = aode(indata[trn,])
predict_aode = predict(AODE_Model, indata[test,])
CrossTable(as.numeric(indata$Species)[test], predict_aode)

                                 | predict_aode 
as.numeric(indata$Species)[test] |         1 |         3 | Row Total | 
---------------------------------|-----------|-----------|-----------|
                               1 |        15 |         5 |        20 | 
                                 |     0.500 |     4.500 |           | 
                                 |     0.750 |     0.250 |     0.400 | 
                                 |     0.333 |     1.000 |           | 
                                 |     0.300 |     0.100 |           | 
---------------------------------|-----------|-----------|-----------|
                               2 |        11 |         0 |        11 | 
                                 |     0.122 |     1.100 |           | 
                                 |     1.000 |     0.000 |     0.220 | 
                                 |     0.244 |     0.000 |           | 
                                 |     0.220 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|
                               3 |        19 |         0 |        19 | 
                                 |     0.211 |     1.900 |           | 
                                 |     1.000 |     0.000 |     0.380 | 
                                 |     0.422 |     0.000 |           | 
                                 |     0.380 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|
                    Column Total |        45 |         5 |        50 | 
                                 |     0.900 |     0.100 |           | 
---------------------------------|-----------|-----------|-----------|

You can see one of the classes is missing in prediction.您可以看到预测中缺少其中一个类。 Let's increase it to 4:让我们将其增加到 4:

indata <- data.frame(lapply(iris[,1:4],discretize,breaks=4),Species=iris$Species)
AODE_Model = aode(indata[trn,])
predict_aode = predict(AODE_Model, indata[test,])
CrossTable(as.numeric(indata$Species)[test], predict_aode)

                                 | predict_aode 
as.numeric(indata$Species)[test] |         1 |         2 |         3 | Row Total | 
---------------------------------|-----------|-----------|-----------|-----------|
                               1 |        20 |         0 |         0 |        20 | 
                                 |    18.000 |     4.800 |     7.200 |           | 
                                 |     1.000 |     0.000 |     0.000 |     0.400 | 
                                 |     1.000 |     0.000 |     0.000 |           | 
                                 |     0.400 |     0.000 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                               2 |         0 |        10 |         1 |        11 | 
                                 |     4.400 |    20.519 |     2.213 |           | 
                                 |     0.000 |     0.909 |     0.091 |     0.220 | 
                                 |     0.000 |     0.833 |     0.056 |           | 
                                 |     0.000 |     0.200 |     0.020 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                               3 |         0 |         2 |        17 |        19 | 
                                 |     7.600 |     1.437 |    15.091 |           | 
                                 |     0.000 |     0.105 |     0.895 |     0.380 | 
                                 |     0.000 |     0.167 |     0.944 |           | 
                                 |     0.000 |     0.040 |     0.340 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                    Column Total |        20 |        12 |        18 |        50 | 
                                 |     0.400 |     0.240 |     0.360 |           | 
---------------------------------|-----------|-----------|-----------|-----------|

It gets only 3 wrong.它只有3个错误。 To me, it's a matter of playing with discretization without overfitting, which can be tricky..对我来说,这是一个在不过度拟合的情况下进行离散化的问题,这可能很棘手。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM