简体   繁体   中英

Why does naiveBayes return all NA's for multiclass classification in R?

Started to write this question, and then figured out the answer. Going to put it here for posterity, since it was hard to find answers on this.

I'm trying to use the naiveBayes classifier from the e1071 package. It seems to have no trouble generating predictions for new data, but I actually need the probability estimates for the classes of the new data.

Example:

> model <- naiveBayes(formula=as.factor(V11)~., data=table, laplace=3)
> predict(model, table[,1:10]) 
[1] 4 4 4 4 4 4 4 4 1 1 1 3 3 1 1
> predict(model, table[,1:10], type="raw")
       1  2  3  4
 [1,] NA NA NA NA
 [2,] NA NA NA NA
 [3,] NA NA NA NA
 [4,] NA NA NA NA
 [5,] NA NA NA NA
 [6,] NA NA NA NA
 [7,] NA NA NA NA
 [8,] NA NA NA NA
 [9,] NA NA NA NA
[10,] NA NA NA NA
[11,] NA NA NA NA
[12,] NA NA NA NA
[13,] NA NA NA NA
[14,] NA NA NA NA
[15,] NA NA NA NA

This seems absurd to me, since the fact that the model is able to output predictions means it must have probability estimates for the classes. What is causing this strange behaviour?

Some things I've already tried without success:

  • adding type="raw" to the model construction call.
  • Using the NaiveBayes function from the klaR package instead (which cannot handle the .

An example of some data which produces this error:

table[1:5,]
  V1 V2       V3         V4        V5        V6        V7        V8        V9
1  0  0 0.000000  0.0000000  0.000000 0.0000000 0.6711444 0.7110409 0.0000000
2  0  0 0.000000  0.0000000 -1.345804 2.1978370 0.6711444 0.7110409 0.0000000
3  0  0 1.923538 -3.6718725  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
4  0  0 1.923538 -0.4079858  0.000000 0.0000000 0.0000000 0.0000000 0.8980172
5  0  0 0.000000  0.0000000 -1.345804 0.2930449 0.6711444 0.7110409 0.0000000
         V10 V11
1  0.0000000   6
2  0.0000000   3
3 -3.1316213   2
4 -0.2170431   5
5  0.0000000   4

This is happening because one of the classes in the dataset has only one instance.

An easy fix for my application was to clone that record and add a tiny amount of noise, after which predict works as expected.

Edit: it actually seems the addition of noise is not always required. Here's a really simple example that resolves the dataset posted in the question, by simply adding an extra copy of every row in the table:

> table <- as.data.frame(rbind(as.matrix(table),as.matrix(table))
> nms <- colnames(table)
> model <- naiveBayes(table[,1:length(nms)-1], factor(table[,length(nms)]))
> predict(model, table[,1:(length(nms)-1)], type='raw')
                 2            3            4            5            6
 [1,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [2,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [3,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [4,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
 [5,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12
 [6,] 2.480502e-34 6.283185e-12 6.283185e-12 2.480502e-34 1.000000e+00
 [7,] 1.558542e-45 9.999975e-01 2.506622e-06 1.558542e-45 6.283170e-12
 [8,] 1.000000e+00 1.558545e-45 1.558545e-45 6.283185e-12 2.480502e-34
 [9,] 6.283185e-12 1.558545e-45 1.558545e-45 1.000000e+00 2.480502e-34
[10,] 1.558542e-45 2.506622e-06 9.999975e-01 1.558542e-45 6.283170e-12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM