简体   繁体   English

随机砍伐森林的超参数调整

[英]Hyper parameter tuning for Random cut forest

I have used to below hyper parameters to train the model. 我已经习惯了以下的超参数来训练模型。

  rcf.set_hyperparameters(
        num_samples_per_tree=200,
        num_trees=250,
        feature_dim=1,
        eval_metrics =["accuracy", "precision_recall_fscore"])

is there any best way to choose the num_samples_per_tree and num_trees parameters. 有没有最好的方法来选择num_samples_per_tree和num_trees参数。

what are the best numbers for both num_samples_per_tree and num_trees. num_samples_per_tree和num_trees的最佳数字是多少。

There are natural interpretations for these two hyper-parameters that can help you determine good starting approximations for HPO: 这两个超参数有自然的解释,可以帮助您确定HPO的良好起始近似值:

  • num_samples_per_tree -- the reciprocal of this value approximates the density of anomalies in your data set/stream. num_samples_per_tree - 此值的倒数近似于数据集/流中的异常密度。 For example, if you set this to 200 then the assumption is that approximately 0.5% of the data is anomalous. 例如,如果将此值设置为200那么假设大约0.5%的数据是异常的。 Try exploring your dataset to make an educated estimate. 尝试探索数据集以进行有根据的估算。
  • num_trees -- the more trees in your RCF model the less noise in scores. num_trees - RCF模型中的树木越多,得分中的噪音越少。 That is, if more trees are reporting that the input inference point is an anomaly then the point is much more likely to be an anomaly than if few trees suggest so. 也就是说,如果更多的树报告输入推断点是异常,则该点更可能是异常,而不是少数树所暗示的那样。

The total number of points sampled from the input dataset is equal to num_samples_per_tree * num_trees . 从输入数据集中采样的总点数等于num_samples_per_tree * num_trees You should make sure that the input training set is at least this size. 您应该确保输入训练集至少是这个大小。

(Disclosure - I helped create SageMaker Random Cut Forest) (披露 - 我帮助创建了 SageMaker随机砍伐森林)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS-Sage Maker随机砍伐森林 - AWS - Sage Maker Random Cut Forest AWS SageMaker随机砍伐森林还是Kinesis Data Analytics随机砍伐森林? - AWS SageMaker Random Cut Forest or Kinesis Data Analytics Random Cut Forest? AWS Kinesis SQL 的问题 - 随机森林砍伐算法 - Issue with AWS Kinesis SQL - Random Cut Forest algorithm 在本地使用 AWS ML model 随机森林砍伐森林 - Use AWS ML model Random Cut Forest locally 什么是类似于 AWS 的 Kinesis Random Cut Forest 算法的用于时间序列流数据的 Google Clouds 异常检测解决方案? - What is Google Clouds anomaly detection solution for time series streaming data similar to AWS' Kinesis Random Cut Forest algorithm? cloud 9 和 sagemaker - 超参数优化 - cloud 9 and sagemaker - hyper parameter optimisation SageMaker 可以根据负载动态分配资源吗? (即在参数调整的同时运行 5000 个模型) - Can SageMaker dynamically allocate resources based on the load? (i.e. run 5000 models in parallel with parameter tuning) 超级 API 和 AWS EC2 - Hyper API and AWS EC2 在服务器上进行Apache调优 - Apache Tuning on Server with Nothing Else 在AWS EC2上进行弹性搜索性能调整 - Elastic search performance tuning on AWS EC2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM