繁体   English   中英

Matlab:递归以获得决策树

[英]Matlab: Recursion to get decision tree

我试图通过递归实现决策树:到目前为止,我已经写了以下内容:

  1. 从给定数据集中,找到最佳拆分并返回分支,要给出更多详细信息,可以说我具有特征为矩阵列的数据,最后一列指示数据1的类别,-1。
  2. 基于1。我有一个最佳特征与该分支下的分支一起分裂,可以说基于信息增益,我得到特征9是最佳分裂,特征9中的唯一值{1,3,5}是9
  3. 我已经弄清楚了如何获取与ach分支相关的数据,然后我需要遍历每个分支的数据以获取下一组拆分。 我在计算此递归时遇到麻烦。

这是我到目前为止的代码,我现在正在执行的递归看起来不正确:如何解决此问题?

function [indeces_of_node, best_split] = split_node(X_train, Y_train)

    %cell to save split information
    feature_to_split_cell = cell(size(X_train,2)-1,4);

    %iterate over features
    for feature_idx=1:(size(X_train,2) - 1)
        %get current feature
        curr_X_feature = X_train(:,feature_idx);

        %identify the unique values
        unique_values_in_feature = unique(curr_X_feature);

        H = get_entropy(Y_train); %This is actually H(X) in slides
        %temp entropy holder

        %Storage for feature element's class
        element_class = zeros(size(unique_values_in_feature,1),2);

        %conditional probability H(X|y)
        H_cond = zeros(size(unique_values_in_feature,1),1); 

        for aUnique=1:size(unique_values_in_feature,1)
            match = curr_X_feature(:,1)==unique_values_in_feature(aUnique);
            mat = Y_train(match);
            majority_class = mode(mat);
            element_class(aUnique,1) = unique_values_in_feature(aUnique);
            element_class(aUnique,2) = majority_class;
            H_cond(aUnique,1) = (length(mat)/size((curr_X_feature),1)) * get_entropy(mat);
        end

        %Getting the information gain
        IG = H - sum(H_cond);

        %Storing the IG of features
        feature_to_split_cell{feature_idx, 1} = feature_idx;
        feature_to_split_cell{feature_idx, 2} = max(IG);
        feature_to_split_cell{feature_idx, 3} = unique_values_in_feature;
        feature_to_split_cell{feature_idx, 4} = element_class;
    end
    %set feature to split zero for every fold
    feature_to_split = 0;

    %getting the max IG of the fold
    max_IG_of_fold = max([feature_to_split_cell{:,2:2}]);

    %vector to store values in the best feature
    values_of_best_feature = zeros(size(15,1));

    %Iterating over cell to get get the index and the values under best
    %splited feature.
    for i=1:length(feature_to_split_cell)
        if (max_IG_of_fold == feature_to_split_cell{i,2});
            feature_to_split = i;
            values_of_best_feature = feature_to_split_cell{i,4};
        end
    end
    display(feature_to_split)
    display(values_of_best_feature(:,1)')

    curr_X_feature = X_train(:,feature_to_split);

    best_split = feature_to_split
    indeces_of_node = unique(curr_X_feature)

    %testing
    for k = 1 : length(values_of_best_feature)
        % Condition to stop the recursion, if clases are pure then we are
        % done splitting, if both classes have save number of attributes
        % then we are done splitting.
        if (sum(values_of_best_feature(:,2) == -1) ~= sum(values_of_best_feature(:,2) == 1))
            if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
                mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
                [indeces_of_node, best_split] = split_node(mat1, Y_train);
            end
        end
    end
end

这是我的代码之外的内容:在递归中看起来像是我只进入一个分支的深度,之后我再也没有回到其余的分支

feature_to_split =

     5


ans =

     1     2     3     4     5     6     7     8     9


feature_to_split =

     9


ans =

     3     5     7     8    11


feature_to_split =

    21


feature_to_split =

    21


feature_to_split =

    21


feature_to_split =

    21

如果您对运行以下代码感兴趣: git

多轮调试之后,我想通了答案,我希望有人会从中受益:

for k = 1 : length(values_of_best_feature)
    % Condition to stop the recursion, if clases are pure then we are
    % done splitting, if both classes have save number of attributes
    % then we are done splitting.
    if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
        X_train(:,feature_to_split) = [];
        mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
        %if(level >= curr_level)
        split_node(mat1, Y_train, 1, 2, level-1);
        %end
    end

end
return;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM