[英]AWS - Sage Maker Random Cut Forest
I have aws cpu-utilization data which NAB used to create Anomaly Detection using AWS- SageMaker Random Cut Forest. 我有aws cpu-utilization数据,NAB使用它来使用AWS-SageMaker Random Cut Forest创建异常检测。 i am able to execute it but i need a deeper solution for the Hyper Parameter Tuning.
我能够执行它,但我需要针对“超参数调整”的更深入的解决方案。 I have gone through the AWS- Documentation but need to understand the Hyper Parameter selection.
我已经阅读过AWS文档,但需要了解Hyper Parameter选择。 are the parameters an educated Guess or Do we need to calculate co_disp's mean and standard deviation in order to infer the parameters.
是有根据的Guess或Do的参数,我们需要计算co_disp的均值和标准差才能推断出这些参数。
Thanks in Advance. 提前致谢。
I have tried 100 Trees and 512/256 tree_size to detect anomalies but how to infer these parameters 我尝试了100棵树和512/256 tree_size来检测异常,但是如何推断这些参数
# Set tree parameters
num_trees = 50
shingle_size = 48
tree_size = 512
# Create a forest of empty trees
forest = []
for _ in range(num_trees):
tree = rrcf.RCTree()
forest.append(tree)
# Use the "shingle" generator to create rolling window
#temp_data represents my aws_cpuutilization data
points = rrcf.shingle(temp_data, size=shingle_size)
# Create a dict to store anomaly score of each point
avg_codisp = {}
# For each shingle...
for index, point in enumerate(points):
# For each tree in the forest...
for tree in forest:
# If tree is above permitted size, drop the oldest point (FIFO)
if len(tree.leaves) > tree_size:
tree.forget_point(index - tree_size)
# Insert the new point into the tree
tree.insert_point(point, index=index)
"""Compute codisp on the new point and take the average among all
trees"""
if not index in avg_codisp:
avg_codisp[index] = 0
avg_codisp[index] += tree.codisp(index) / num_trees
values =[]
for key,value in avg_codisp.items():
values.append(value)
Thanks for your interest in RandomCutForest. 感谢您对RandomCutForest的关注。 If you have labeled anomalies we recommend you use SageMaker Automatic Model Tuning ( https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html ), and let SageMaker find the combination that works best.
如果您已标记异常,我们建议您使用SageMaker自动模型调整( https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html ),并让SageMaker找到最合适的组合。
Heuristically, if you know that your data has 0.4% of anomalies, for example, you would set the number of samples per tree to N = 1 / (0.4 / 100) = 250. The idea behind this is that each tree represents a sample of your data. 试探性地,例如,如果您知道数据具有0.4%的异常,则可以将每棵树的样本数设置为N = 1 /(0.4 / 100)=250。这背后的想法是,每棵树代表一个样本您的数据。 Each datapoint in a tree is considered "normal".
树中的每个数据点均被视为“正常”。 If your trees have too few points, eg 10, then most points will look different than these "normal" ones, ie they will have a high anomaly score.
如果您的树上的点太少(例如10),则大多数点看上去将与这些“正常”点不同,即它们的异常得分较高。
The relation between the number of trees and the underlying data is more complex. 树的数量与基础数据之间的关系更加复杂。 As the range of "normal" points grows, you would want to have more trees.
随着“标准”点范围的增加,您将需要更多的树。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.