[英]Hyper parameter tuning for Random cut forest
I have used to below hyper parameters to train the model. 我已经习惯了以下的超参数来训练模型。
rcf.set_hyperparameters(
num_samples_per_tree=200,
num_trees=250,
feature_dim=1,
eval_metrics =["accuracy", "precision_recall_fscore"])
is there any best way to choose the num_samples_per_tree and num_trees parameters. 有没有最好的方法来选择num_samples_per_tree和num_trees参数。
what are the best numbers for both num_samples_per_tree and num_trees. num_samples_per_tree和num_trees的最佳数字是多少。
There are natural interpretations for these two hyper-parameters that can help you determine good starting approximations for HPO: 这两个超参数有自然的解释,可以帮助您确定HPO的良好起始近似值:
num_samples_per_tree
-- the reciprocal of this value approximates the density of anomalies in your data set/stream. num_samples_per_tree
- 此值的倒数近似于数据集/流中的异常密度。 For example, if you set this to 200
then the assumption is that approximately 0.5% of the data is anomalous. 200
那么假设大约0.5%的数据是异常的。 Try exploring your dataset to make an educated estimate. num_trees
-- the more trees in your RCF model the less noise in scores. num_trees
- RCF模型中的树木越多,得分中的噪音越少。 That is, if more trees are reporting that the input inference point is an anomaly then the point is much more likely to be an anomaly than if few trees suggest so. The total number of points sampled from the input dataset is equal to num_samples_per_tree * num_trees
. 从输入数据集中采样的总点数等于
num_samples_per_tree * num_trees
。 You should make sure that the input training set is at least this size. 您应该确保输入训练集至少是这个大小。
(Disclosure - I helped create SageMaker Random Cut Forest) (披露 - 我帮助创建了 SageMaker随机砍伐森林)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.