随机砍伐森林的超参数调整

Question

I have used to below hyper parameters to train the model. 我已经习惯了以下的超参数来训练模型。

  rcf.set_hyperparameters(
        num_samples_per_tree=200,
        num_trees=250,
        feature_dim=1,
        eval_metrics =["accuracy", "precision_recall_fscore"])

is there any best way to choose the num_samples_per_tree and num_trees parameters. 有没有最好的方法来选择num_samples_per_tree和num_trees参数。

what are the best numbers for both num_samples_per_tree and num_trees. num_samples_per_tree和num_trees的最佳数字是多少。

Answer 1

There are natural interpretations for these two hyper-parameters that can help you determine good starting approximations for HPO: 这两个超参数有自然的解释，可以帮助您确定HPO的良好起始近似值：

num_samples_per_tree -- the reciprocal of this value approximates the density of anomalies in your data set/stream. num_samples_per_tree - 此值的倒数近似于数据集/流中的异常密度。 For example, if you set this to 200 then the assumption is that approximately 0.5% of the data is anomalous. 例如，如果将此值设置为200那么假设大约0.5％的数据是异常的。 Try exploring your dataset to make an educated estimate. 尝试探索数据集以进行有根据的估算。
num_trees -- the more trees in your RCF model the less noise in scores. num_trees - RCF模型中的树木越多，得分中的噪音越少。 That is, if more trees are reporting that the input inference point is an anomaly then the point is much more likely to be an anomaly than if few trees suggest so. 也就是说，如果更多的树报告输入推断点是异常，则该点更可能是异常，而不是少数树所暗示的那样。

The total number of points sampled from the input dataset is equal to num_samples_per_tree * num_trees . 从输入数据集中采样的总点数等于num_samples_per_tree * num_trees 。 You should make sure that the input training set is at least this size. 您应该确保输入训练集至少是这个大小。

(Disclosure - I helped create SageMaker Random Cut Forest) （披露 - 我帮助创建了 SageMaker随机砍伐森林）

随机砍伐森林的超参数调整

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-04-16 17:39:40

随机砍伐森林的超参数调整

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-04-16 17:39:40

解决方案1
0 已采纳 2019-04-16 17:39:40