R中的朴素贝叶斯错误：下标超出范围

Question

I'm trying to classify 94 text of speech. 我正在尝试对94个语音文本进行分类。 Since naiveBayes cannot work well if categories of trainset do not exist in categories of testset, I randomized and confirmed. 由于如果测试集的类别中不存在训练集的类别，那么naiveBayes不能很好地工作，因此我将其随机化并确认。 There were no problem with categories. 类别没有问题。 But classifier didn't work with testset. 但是分类器不适用于测试集。 Following is error message: 以下是错误消息：

Df.dtm<-cbind(Df.dtm, category)
dim(Df.dtm)
Df.dtm[1:10, 530:532]

# Randomize and Split data by rownumber
train <- sample(nrow(Df.dtm), ceiling(nrow(Df.dtm) * .50))
test <- (1:nrow(Df.dtm))[- train]

# Isolate classifier
cl <- Df.dtm[, "category"]
> summary(cl[train])
  dip  eds  ind pols 
  23    8    3   13 

# Create model data and remove "category"
modeldata <- Df.dtm[,!colnames(Df.dtm) %in% "category"]

#Boolean feature Multinomial Naive Bayes
#Function to convert the word frequencies to yes and no labels
convert_count <- function(x) {
  y <- ifelse(x > 0, 1,0)
  y <- factor(y, levels=c(0,1), labels=c("No", "Yes"))
  y
}

#Apply the convert_count function to get final training and testing DTMs
train.cc <- apply(modeldata[train, ], 2, convert_count)
test.cc <- apply(modeldata[test, ], 2, convert_count)

#Training the Naive Bayes Model
#Train the classifier
system.time(classifier <- naiveBayes(train.cc, cl[train], laplace = 1) )

This classifier worked well: 用户系统流逝 0.45 0.00 0.46 该分类器运行良好：用户系统流逝0.45 0.00 0.46

#Use the classifier we built to make predictions on the test set.
system.time(pred <- predict(classifier, newdata=test.cc))

However, prediction failed. 但是，预测失败。 Error in [.default (object$tables[[v]], , nd) : 下标出界 Timing stopped at: 0.2 0 0.2 [.default （object $ tables [[v]]，，nd）中的错误：下标出界定时停止于：0.2 0 0.2

Answer 1

Consider the following: 考虑以下：

# Indicies of training observations as observations.
train <- sample(nrow(Df.dtm), ceiling(nrow(Df.dtm) * .50))

# Indicies of whatever is left over from the previous sample, again, also observations are being returned. 
#that still remains inside of Df.dtm, notation as follows:
test <- Df.dtm[-train,]

After clearing up what my sample returned (row indicies) and how I wanted to slice up my test set (again, rows or columns need to be established at this point), the I would tweak that apply function with the argument necessary here is a link of how the apply function works , but for the sake of time, if you pass it a 2 you apply over each column and if you pass it a 1 it will apply the function given over each row . 在清除了样本返回的内容（行索引）以及如何分割测试集（再次需要在此处建立行或列）之后，我将对apply函数进行调整，其中必要的参数是链接到apply函数的工作原理，但是为了时间起见，如果将2传递给您，则将其应用于每column ，如果将1传递给它，则将对每row应用给定的函数。 Again, depending on how you want your sample (rows or columns) we can tweak this either way. 同样，根据您想要样品（行或列）的方式，我们可以用任何一种方式进行调整。

R中的朴素贝叶斯错误：下标超出范围

问题描述

1 个解决方案

解决方案1
0 2017-02-03 13:24:56

R中的朴素贝叶斯错误：下标超出范围

问题描述

1 个解决方案

解决方案1 0 2017-02-03 13:24:56

解决方案1
0 2017-02-03 13:24:56