NLP：分類給出錯誤的結果。如何發現NLP分類的結果是錯誤的？

Question

我已經開始學習自然語言處理，並且已經開始跌跌撞撞。

我正在使用NodeJs在NaturalNode library Natural Node GitHub項目的幫助下創建我的應用程序

問題

我正在使用以下幾種方案來訓練我的文檔

/// importing package
var natural = require('natural');
var classifier = new natural.BayesClassifier();



/// traning document
classifier.addDocument("h", "greetings");
classifier.addDocument("hi", "greetings");
classifier.addDocument("hello", "greetings");
classifier.addDocument("data not working", "internet_problem");
classifier.addDocument("browser not working", "internet_problem");
classifier.addDocument("google not working", "internet_problem");
classifier.addDocument("facebook not working", "internet_problem");
classifier.addDocument("internet not working", "internet_problem");
classifier.addDocument("websites not opening", "internet_problem");
classifier.addDocument("apps not working", "internet_problem");
classifier.addDocument("call drops", "voice_problem");
classifier.addDocument("voice not clear", "voice_problem");
classifier.addDocument("call not connecting", "voice_problem");
classifier.addDocument("calls not going through", "voice_problem");
classifier.addDocument("disturbance", "voice_problem");
classifier.addDocument("bye", "close");
classifier.addDocument("thank you", "feedback_positive");
classifier.addDocument("thanks", "voice_problem");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("shit", "feedback_negeive");
classifier.addDocument("useless", "feedback_negetive");
classifier.addDocument("siebel testing", "siebel_testing")


classifier.train();


/// running classification
console.log('result for hi');
console.log(classifier.classify('hi'));
console.log('result for hii');
console.log(classifier.classify('hii'));
console.log('result for h');
console.log(classifier.classify('h'));

輸出量

 result for hi: greetings result for hii: internet_problem result for h: internet_problem

如您在密鑰工作的結果中看到的， hi的值是正確的，但是如果我為hii或ih拼錯了hi ，那么它將給出錯誤的結果。 我無法理解分類的工作原理以及應該如何訓練分類器，或者是否有辦法找出分類結果錯誤，以便我可以要求用戶再次輸入。

任何幫助或解釋或任何東西都受到高度贊賞。 提前謝謝了。

請認為我是菜鳥，請原諒任何錯誤。

Answer 1

hii和ih以前從未被您的分類器看到，因此除非自然natural.BayesClassifier會對輸入進行一些預處理，否則不知道如何處理它們，因此使用從個體頻率中得出的先驗概率對它們進行分類類標簽： Internet_problem是您的22個培訓示例中最常見的標簽。

編輯29/12/2016：如評論中所述，可以通過提示用戶重新輸入其分類置信度低於給定最小閾值的數據來處理“不良”分類：

const MIN_CONFIDENCE = 0.2; // Tune this

var classLabel = null;
do {
    var userInput = getUserInput(); // Get user input somehow
    var classifications = classifier.getClassifications(userInput);
    var bestClassification = classifications[0];
    if (bestClassification["value"] < MIN_CONFIDENCE) {
        // Re-prompt user in the next iteration
    } else {
        classLabel = bestClassification["label"];
    }   
} while (classLabel == null);
// Do something with the label

NLP：分類給出錯誤的結果。如何發現NLP分類的結果是錯誤的？

問題描述

1 個解決方案

解決方案1
2 2016-12-28 10:27:01

NLP：分類給出錯誤的結果。 如何發現NLP分類的結果是錯誤的？

問題描述

1 個解決方案

解決方案1 2 2016-12-28 10:27:01

NLP：分類給出錯誤的結果。如何發現NLP分類的結果是錯誤的？

解決方案1
2 2016-12-28 10:27:01