简体   繁体   English

插入符包装方法=“ treebag”

[英]Caret Package method = “treebag”

Here is my output from running the train function: 这是我运行火车功能的输出:

Bagged CART 


1251 samples
  30 predictors
   2 classes: 'N', 'Y' 


No pre-processing
Resampling: Bootstrapped (25 reps) 


Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ... 


Resampling results


  Accuracy  Kappa  Accuracy SD  Kappa SD
  0.806     0.572  0.0129       0.0263  

Here is my confusionMatrix 这是我的困惑

Bootstrapped (25 reps) Confusion Matrix 


(entries are percentages of table totals)

          Reference
Prediction    N       Y
         N    24.8   7.9
         Y    11.5  55.8

After partitioning the data set - 80% train and 20% test, I train the model, and then I do a "predict" on my test partition and get ~65% accuracy. 划分数据集后-训练80%,测试20%,我训练模型,然后对测试分区进行“预测”,并获得〜65%的准确性。

Questions: 问题:

(1) Does this mean my model is not very good?
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ?  Would a Logistic Regression method be better?
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'.  Could this be "skewing" my training / results?  Do I need a ratio closer to 50 - 50?

Any help would be greatly appreciated!! 任何帮助将不胜感激!!

Code and a reproducible example would help here. 代码和可复制的示例将在这里有所帮助。

Assuming the confusion matrix came from running confusionMatrix.train , then I would say that your model looks pretty good. 假设混淆矩阵来自运行confusionMatrix.train ,那么我想说您的模型看起来不错。 The difference in accuracy is a little puzzling. 准确性的差异有些令人困惑。 I've seen test set results look worse than the resampling results regularly but the bootstrap can be pretty pessimistic in measuring performance and here it looks much better than the test set. 我已经看到测试集结果看起来比常规的重采样结果差,但是引导程序在衡量性能方面可能非常悲观,在这里看起来比测试集好得多。 Try with a different training/test split and see if you get something similar (or try repeated 10-fold CV). 尝试使用其他训练/测试组,看看是否得到相似的结果(或尝试重复10倍CV)。

(a) again, hard to say with what you have posted (a)再说一遍,很难说

(b) that model is excellent and there is no general rule about which model is better or worse (google the "no free lunch" theorem) (b)该模型非常好,没有关于哪个模型更好或更坏的一般规则(谷歌“无免费午餐”定理)

(c) that imbalance isn't too bad so I don't think that it is an issue (unless the training and test set percentages are different) (c)失衡不是太严重,所以我不认为这是个问题(除非训练和测试集的百分比不同)

Max 马克斯

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM