Started to write this question, and then figured out the answer. Going to put it here for posterity, since it was hard to find answers on this.
I'm trying to use the naiveBayes classifier from the e1071 package. It seems to have no trouble generating predictions for new data, but I actually need the probability estimates for the classes of the new data.
Example:
> model <- naiveBayes(formula=as.factor(V11)~., data=table, laplace=3)
> predict(model, table[,1:10])
[1] 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1
> predict(model, table[,1:10], type="raw")
1 2 3 4
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
[5,] NA NA NA NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
[9,] NA NA NA NA
[10,] NA NA NA NA
[11,] NA NA NA NA
[12,] NA NA NA NA
[13,] NA NA NA NA
[14,] NA NA NA NA
[15,] NA NA NA NA
This seems absurd to me, since the fact that the model is able to output predictions means it must have probability estimates for the classes. What is causing this strange behaviour?
Some things I've already tried without success:
An example of some data which produces this error:
table[1:5,]
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 0 0 0.000000 0.0000000 0.000000 0.0000000 0.6711444 0.7110409 0.0000000
2 0 0 0.000000 0.0000000 -1.345804 2.1978370 0.6711444 0.7110409 0.0000000
3 0 0 1.923538 -3.6718725 0.000000 0.0000000 0.0000000 0.0000000 0.8980172
4 0 0 1.923538 -0.4079858 0.000000 0.0000000 0.0000000 0.0000000 0.8980172
5 0 0 0.000000 0.0000000 -1.345804 0.2930449 0.6711444 0.7110409 0.0000000
V10 V11
1 0.0000000 6
2 0.0000000 3
3 -3.1316213 2
4 -0.2170431 5
5 0.0000000 4
This is happening because one of the classes in the dataset has only one instance.
An easy fix for my application was to clone that record and add a tiny amount of noise, after which predict works as expected.
Edit: it actually seems the addition of noise is not always required. Here's a really simple example that resolves the dataset posted in the question, by simply adding an extra copy of every row in the table:
> table <- as.data.frame(rbind(as.matrix(table),as.matrix(table))
> nms <- colnames(table)
> model <- naiveBayes(table[,1:length(nms)-1], factor(table[,length(nms)]))
> predict(model, table[,1:(length(nms)-1)], type='raw')
2 3 4 5 6
[1,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
[2,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
[3,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
[4,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[5,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
[6,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
[7,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
[8,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
[9,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[10,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.