简体   繁体   English

天真的分类器matlab

[英]naive classifier matlab

When testing the naive classifier in matlab I get different results even though I trained and tested on the same sample data, I was wondering if my code is correct and if someone could help explain why this is? 当在matlab中测试天真分类器时,即使我在相同的样本数据上进行了训练和测试,我得到了不同的结果,我想知道我的代码是否正确以及是否有人可以帮助解释为什么这样做?

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);

%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows)';

%# pick random columns
%indY = randperm( size(fulldata,2) );
indY = indY(1:columns);

%# filter data
data = U(indX,indY);

%% apply normalization method to every cell
data = zscore(data);

%create a training set the same as datasample
training_data = data;

%match the class labels to the corresponding rows
target_class = classlabels(indX,:)

%classify the same data sample to check if naive bayes works
class  = classify(data, training_data, target_class, 'diaglinear')
confusionmat(test_class, class)

Here is an example: 这是一个例子:

在此输入图像描述

Notice it got ipsweep, teardrop and back mixed up with normal traffic. 注意它有ipsweep,泪珠和背部与正常交通混合。 I haven't gotten to the stage of classifying unseen data yet I just wanted to test if it would classify the same data. 我还没有进入分类看不见的数据的阶段,但我只是想测试它是否会对相同的数据进行分类。

The confusion matrix output: 混淆矩阵输出:

ans =

   537     0     0     0     0     0     0     1     0
     0   224     0     0     0     1     0     1     0
     0     0    91    79     0    17    24     4     0
     0     0     0     8     0     0     2     0     0
     0     0     0     0     3     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     2     0     0
     0     0     0     0     0     0     0     3     0
     0     0     0     0     0     1     0     0     1

Although I have no clue what this actually is and I probably got this wrong in my code but I thought I would just test to see what it outputs. 虽然我不知道这实际上是什么,我可能在我的代码中弄错了但我想我只是测试看它输出什么。

You are using a classifier on data of reduced dimensionality. 您正在使用降维数据的分类器。 A classifier is meant to be slightly imprecise because it needs to generalize. 分类器意味着稍微不精确,因为它需要概括。 In the dimensionality reduction stage you are loosing information which also leads to reduced classification performance. 在维度降低阶段,您将丢失信息,这也会导致分类性能降低。

Don't expect perfect performance even on the training set, this would be a bad case of over-fitting . 即使在训练集上也不要期望完美的表现,这将是一个过度拟合的坏情况。

As for the use of the confusion matrix. 至于混淆矩阵的使用。 C(3,4)=79 means nothing more than that for 79 data points the class should be 3 and they got classified as class 4. The complete matrix says that your classifier works well for classes 1 and 2 but has problems with class 3. The rest of the classes have almost no data so it is difficult to judge how good the classifier works for them. C(3,4)=79意味着只有79个数据点的数量,该类应该是3并且它们被归类为第4类。完整的矩阵表明你的分类器适用于1级和2级但是在3级时有问题其余的类几乎没有数据,所以很难判断分类器对它们有多好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM