插入符包装方法=“ treebag”

Question

这是我运行火车功能的输出：

Bagged CART 


1251 samples
  30 predictors
   2 classes: 'N', 'Y' 


No pre-processing
Resampling: Bootstrapped (25 reps) 


Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ... 


Resampling results


  Accuracy  Kappa  Accuracy SD  Kappa SD
  0.806     0.572  0.0129       0.0263

这是我的困惑

Bootstrapped (25 reps) Confusion Matrix 


(entries are percentages of table totals)

          Reference
Prediction    N       Y
         N    24.8   7.9
         Y    11.5  55.8

划分数据集后-训练80％，测试20％，我训练模型，然后对测试分区进行“预测”，并获得〜65％的准确性。

问题：

(1) Does this mean my model is not very good?
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ?  Would a Logistic Regression method be better?
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'.  Could this be "skewing" my training / results?  Do I need a ratio closer to 50 - 50?

任何帮助将不胜感激！！

Answer 1

代码和可复制的示例将在这里有所帮助。

假设混淆矩阵来自运行confusionMatrix.train ，那么我想说您的模型看起来不错。 准确性的差异有些令人困惑。 我已经看到测试集结果看起来比常规的重采样结果差，但是引导程序在衡量性能方面可能非常悲观，在这里看起来比测试集好得多。 尝试使用其他训练/测试组，看看是否得到相似的结果（或尝试重复10倍CV）。

（a）再说一遍，很难说

（b）该模型非常好，没有关于哪个模型更好或更坏的一般规则（谷歌“无免费午餐”定理）

（c）失衡不是太严重，所以我不认为这是个问题（除非训练和测试集的百分比不同）

马克斯

插入符包装方法=“ treebag”

问题描述

1 个解决方案

解决方案1
1 2014-11-14 20:43:21

插入符包装方法=“ treebag”

问题描述

1 个解决方案

解决方案1 1 2014-11-14 20:43:21

解决方案1
1 2014-11-14 20:43:21