测试OpenNLP分类器模型

Question

我目前正在为分类器训练模型。 昨天我发现，如果您还测试创建的分类模型，它将更加准确。 我尝试在Internet上搜索如何测试模型：测试openNLP模型。 但是我无法使其正常工作。 我认为原因是因为我使用的是OpenNLP版本1.83，而不是1.5。 谁能解释我如何在此版本的OpenNLP中正确测试我的模型？

提前致谢。

以下是即时训练模型的方式：

public static DoccatModel trainClassifier() throws IOException
    {
        // read the training data
        final int iterations = 100;
        InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/trainingssetTest.txt"));
        ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
        ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

        // define the training parameters
        TrainingParameters params = new TrainingParameters();
        params.put(TrainingParameters.ITERATIONS_PARAM, iterations+"");
        params.put(TrainingParameters.CUTOFF_PARAM, 0+"");
        params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);

        // create a model from traning data
        DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());

        return model;
    }

Answer 1

我可以想到两种测试模型的方法。 无论哪种方式，您都需要具有批注的文档（通过批注，我的意思是专家分类的）。

第一种方法涉及使用opennlp DocCatEvaluator。 语法类似于

opennlp DoccatEvaluator -model model -data sampleData

您的sampleData格式应为

OUTCOME <document text....>

文档之间用换行符分隔。

第二种方法涉及创建DocumentCategorizer 。 类似于：（该模型是您问题中的DocCat模型）

DocumentCategorizer categorizer = new DocumentCategorizerME(model);

// could also use: Tokenizer tokenizer = new TokenizerME(tokenizerModel)
Tokenizer tokenizer = WhitespaceTokenizer.INSTANCE();

 // linesample is like in your question...
for(String sample=linesample.read(); sample != null; sample=linesample.read()){
    String[] tokens = tokenizer.tokenize(sample);
    double[] outcomeProb = categorizer.categorize(tokens);
    String sampleOutcome = categorizer.getBestCategory(outcomeProb);

  // check if the outcome is right...
  // keep track of # right and wrong...
}
// calculate agreement metric of your choice

由于我在此处键入代码，可能会出现一两个语法错误（我或SO社区都可以解决），但是通过数据运行，标记化，通过文档分类器运行并跟踪结果的想法是您想评估模型。

希望能帮助到你...

测试OpenNLP分类器模型

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-22 16:09:49

测试OpenNLP分类器模型

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-22 16:09:49

解决方案1
1 已采纳 2017-11-22 16:09:49