NLC或R＆R的再训练方法

Question

The ground truth we know is used to re-train the NLC or R&R. 我们知道的基本事实用于重新训练NLC或R＆R。

The ground truth is a question level training data. 基本事实是问题级别的训练数据。

eg 例如

"How hot is it today?,temperature" “今天天气热吗？温度”

The question "how hot is it today?" 问题“今天有多热？” is therefore classified to "temperature" class. 因此被归为“温度”等级。

Once the application is up, real user questions will be received. 应用程序启动后，将收到真实的用户问题。 Some are the same (ie the question from the real users are the same to the question in the ground truth), some are similar terms, some are new questions. 有些是相同的（即，来自真实用户的问题与基本事实中的问题是相同的），有些是相似的术语，有些是新问题。 Assume the application has a feedback loop to know whether or not the class (for NLC) or answer (for R&R) are relevant. 假设应用程序有一个反馈循环，以了解该类（对于NLC）或答案（对于R＆R）是否相关。

About the new questions, the approach seems to just add the them to the ground truth, which is then used to re-train the NLC/R&R?
For the questions with similar terms, do we just add them like the new questions, or do we just ignore them, given that similar terms can also be scored well even similar terms are not used to train the classifier?
In the case of the same questions, there seems nothing to do on the ground truth for NLC, however, to the R&R, are we just increase or decrease 1 for the relevance label in the ground truth?

The main question here is, in short, about what the re-training approach is for NLC & R&R... 简而言之，这里的主要问题是关于NLC＆R＆R的再培训方法是什么...

Answer 1

Once your application has gone live, you should periodically review your feedback log for opportunities for improvement. 应用程序上线后，您应该定期查看反馈日志，以寻求改进的机会。 For NLC, if there are texts being incorrectly classified, then you can add those texts to the training set and retrain in order to improve your classifier. 对于NLC，如果有一些文本分类不正确，则可以将这些文本添加到训练集中并重新训练，以改善分类器。

It is not necessary to capture every imaginable variation of a class, as long as your classifier is returning acceptable responses. 只要您的分类器返回可接受的响应，就不必捕获类的每个可能的变体。

You could use the additional examples of classes from your log to assemble a test set of texts that do not feature in your training set. 您可以使用日志中的其他类示例来组装测试集中没有的文本测试集。 Running this test set when you make changes will enable you to determine whether or not a change has inadvertently caused a regression. 进行更改时运行此测试集将使您能够确定更改是否无意间导致了回归。 You can run this test either by calling the classifier using a REST client, or via the Beta Natural Language Classifier toolkit. 您可以通过使用REST客户端调用分类器或通过Beta自然语言分类器工具包来运行此测试。

Answer 2

A solid retraining approach should be getting feedback from live users. 可靠的再培训方法应该是从实时用户那里获得反馈。 Your testing and validation of any retrained NLC (or R&R for that matter) should be guided by some of the principles that James Ravenscroft has outlined here ( https://brainsteam.co.uk/2016/03/29/cognitive-quality-assurance-an-introduction/ ). 您对任何经过再培训的NLC（或有关R＆R）的测试和验证应遵循James Ravenscroft在此处概述的一些原则（ https://brainsteam.co.uk/2016/03/29/cognitive-quality-保证简介/ ）。

The answer by @davidgeorgeuk is correct, but fails to extend the thought to the conclusion that you are looking for. @davidgeorgeuk 的回答是正确的，但未能将想法扩展到您正在寻找的结论。 I would have a monthly set of activities where I would go through application logs where REAL users are indicating that your not classifying things correctly, and also incorporate any new classes to your classifier. 我将每月进行一系列活动，其中将浏览应用程序日志，其中REAL用户指示您未正确分类事物，并且还将任何新类合并到分类器中。 I would retrain a second instance of NLC with the new data, and go through the test scenarios outlined above. 我将使用新数据重新训练NLC的第二个实例，并进行上面概述的测试方案。

Once you are satisfied that you have IMPROVED your model, I would then switch my code to point at the new NLC instance, and the old NLC instance would be your "backup" instance, and the one that you would use for this exercise the next month. 当您对模型进行了改进后，我将切换代码以指向新的NLC实例，而旧的NLC实例将成为您的“备份”实例，下一个将用于本练习的实例月。 It's just applying a simple DevOps approach to managing your NLC instances. 它只是使用一种简单的DevOps方法来管理您的NLC实例。 You could extend this to a development, QA, production scenario if you wanted. 如果需要，可以将其扩展到开发，质量检查，生产方案。

NLC或R＆R的再训练方法

问题描述

2 个解决方案

解决方案1
4 2016-05-09 11:38:03

解决方案2
0 2016-05-09 13:37:58

NLC或R＆R的再训练方法

问题描述

2 个解决方案

解决方案1 4 2016-05-09 11:38:03

解决方案2 0 2016-05-09 13:37:58

解决方案1
4 2016-05-09 11:38:03

解决方案2
0 2016-05-09 13:37:58