简体   繁体   English

AWS SageMaker随机砍伐森林还是Kinesis Data Analytics随机砍伐森林?

[英]AWS SageMaker Random Cut Forest or Kinesis Data Analytics Random Cut Forest?

I need to put together an architecture that can detect anomalies in logs created by a web application. 我需要建立一个可以检测Web应用程序创建的日志中异常的体系结构。

The Random Cut Forest algorithm constantly pops up in my research, where it is used in two scenarios: SageMaker and Kinesis Data Analytics. 我的研究不断弹出“随机砍伐森林”算法,该算法在两种情况下使用:SageMaker和Kinesis Data Analytics。

Which of these two services should I use in my architecture? 我应该在体系结构中使用这两项服务中的哪一项?

At the core, the mathematical methodology between the two is nearly identical, but there are some differences in how they are implemented within Kinesis and SageMaker that should help drive your decision. 从根本上说,两者之间的数学方法几乎完全相同,但是在Kinesis和SageMaker中如何实现它们方面存在一些差异,这应该有助于您做出决定。

Kinesis RandomCutForest: Kinesis RandomCutForest:

  • Streaming version of the algorithm which is great for near-real-time updates to the model. 该算法的流版本非常适合对模型进行近实时更新。
  • Supports time decay of older records, shingling of the input data, and if you are using multiple dimensions, anomaly attribution that helps you understand the effect of each of the dimensions. 支持旧记录的时间衰减,输入数据的混合以及如果您使用的是多个维度,则异常归因可以帮助您了解每个维度的影响。
  • So, in case your logs are being stored in CloudWatch, by using subscription filters (and Lambda if needed) you can get them preprocessed and sent to Kinesis with little effort. 因此,如果您的日志存储在CloudWatch中,则可以使用订阅过滤器(如果需要,还可以使用Lambda),可以对其进行预处理并毫不费力地发送到Kinesis。

SageMaker RandomCutForest: SageMaker RandomCutForest:

  • Batch version of the algorithm, great for large datasets (typically stored in S3) or where there's no need to update the model frequently. 该算法的批处理版本非常适合大型数据集(通常存储在S3中)或不需要频繁更新模型的地方。
  • Similar to Kinesis, supports near-real-time scoring of incoming data points via inference endpoint, but new data points do not change the underlying model. 与Kinesis相似,它支持通过推断端点对传入数据点进行近实时评分,但是新数据点不会更改基础模型。
  • Supports hyper parameter optimization, which identifies the best set of parameters for your model (such as number of samples, number of trees etc.) 支持超级参数优化,该优化可确定模型的最佳参数集(例如样本数,树数等)。
  • Scaling up instances for both training and scoring is straightforward, and the available SageMaker Notebooks can help you preprocess and prepare your data for training. 扩展实例以进行培训和评分非常简单,可用的SageMaker Notebook可帮助您预处理和准备数据以进行培训。
  • So, if your dataset is large and you don't have a need for dynamic updates to your model, SageMaker solution should be preferred solution for you. 因此,如果数据集很大并且不需要动态更新模型,则SageMaker解决方案应该是您的首选解决方案。

Hope this answers your question. 希望这能回答您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Kinesis SQL 的问题 - 随机森林砍伐算法 - Issue with AWS Kinesis SQL - Random Cut Forest algorithm AWS-Sage Maker随机砍伐森林 - AWS - Sage Maker Random Cut Forest 什么是类似于 AWS 的 Kinesis Random Cut Forest 算法的用于时间序列流数据的 Google Clouds 异常检测解决方案? - What is Google Clouds anomaly detection solution for time series streaming data similar to AWS' Kinesis Random Cut Forest algorithm? 在本地使用 AWS ML model 随机森林砍伐森林 - Use AWS ML model Random Cut Forest locally 随机砍伐森林的超参数调整 - Hyper parameter tuning for Random cut forest AWS Kinesis Firehose和数据分析 - AWS Kinesis Firehose and Data Analytics AWS Kinesis Analytics-数据聚合 - AWS Kinesis Analytics - Data Aggregation 通过 AWS Kinesis Data Analytics 使用 DynamoDB Streams - Consuming DynamoDB Streams with AWS Kinesis Data Analytics AWS Elastic MapReduce 和 AWS Kinesis Data Analytics 有什么区别? - What is the difference between AWS Elastic MapReduce and AWS Kinesis Data Analytics? aws sagemaker 训练管道模式读取随机字节数 - aws sagemaker training pipe mode reading random number of bytes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM