[英]OpenNLP classifier output
目前我正在使用以下代码来训练分类器模型:
final String iterations = "1000";
final String cutoff = "0";
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
model.serialize(modelOut);
return model;
这很顺利,每次运行后我得到以下输出:
Indexing events with TwoPass using cutoff of 0
Computing event counts... done. 1474 events
Indexing... done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1474
Number of Outcomes: 2
Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.
有人能解释这个输出意味着什么吗? 如果它说明了准确性?
看一下这个源码 ,我们可以告诉这个输出是由NaiveBayesTrainer :: trainModel方法完成的:
public AbstractModel trainModel(DataIndexer di) {
// ...
display("done.\n");
display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
display("\t Number of Outcomes: " + numOutcomes + "\n");
display("\t Number of Predicates: " + numPreds + "\n");
display("Computing model parameters...\n");
MutableContext[] finalParameters = findParameters();
display("...done.\n");
// ...
}
如果你看一下findParameters()
代码,你会发现它调用了trainingStats()
方法,该方法包含计算精度的代码片段:
private double trainingStats(EvalParameters evalParams) {
// ...
double trainingAccuracy = (double) numCorrect / numEvents;
display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
return trainingAccuracy;
}
TL; DR Stats: (998/1474) 0.6770691994572592
输出的一部分是您正在寻找的准确度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.