简体   繁体   English

如何使用具有 azure stream 分析的超大数据库来丰富事件?

[英]How to enrich events using a very large database with azure stream analytics?

I'm in the process of analyzing Azure Stream Analytics to replace a stream processing solutions based on NiFi with some REST microservices.我正在分析 Azure Stream Analytics,以用一些 REST 微服务替换基于 NiFi 的 stream 处理解决方案。

One step is the enrichment of sensor data form a very large database of sensors (>120Gb).第一步是从一个非常大的传感器数据库 (>120Gb) 中丰富传感器数据。

Is it possible with Azure Stream Analytics? Azure Stream 分析有可能吗? I tried with a very small subset of the data (60Mb) and couldn't even get it to run.我尝试使用非常小的数据子集 (60Mb),甚至无法运行它。

Job logs give me warnings of memory usage being too high.作业日志警告我 memory 使用率过高。 Tried scaling to 36 stream units to see if it was even possible, to no avail.尝试缩放到 36 stream 单位以查看是否可行,但无济于事。

What strategies do I have to make it work?我必须采取什么策略才能使其发挥作用?

If I deterministically (via a hash function) partition the input stream using N partitions by ID and then partition the database using the same hash function (to get id on stream and ID on database to the same partition) can I make this work?如果我确定性地(通过 hash 函数)按 ID 使用 N 个分区对输入 stream 进行分区,然后使用相同的 hash function 对数据库进行分区(以获取 stream 上的 ID 和数据库上的 ID 到同一个分区),我可以进行这项工作吗? Do I need to create several separated stream analytics jobs do be able to do that?我是否需要创建多个单独的 stream 分析作业才能做到这一点?

I suppose I can use 5Gb chunks, but I could not get it to work with ADSL Gen2 datalake.我想我可以使用 5Gb 块,但我无法让它与 ADSL Gen2 数据湖一起使用。 Does it really only works with Azure SQL?它真的只适用于 Azure SQL 吗?

Stream Analytics supports reference datasets of up to 5GB . Stream Analytics 支持最大 5GB的参考数据集。 Please note that large reference datasets come with the downside of making jobs/nodes restarts very slow (up to 20 minutes for the ref data to be distributed; restarts that may be user initiated, for service updates, or various errors).请注意,大型参考数据集的缺点是使作业/节点重启非常缓慢(分发参考数据最多需要 20 分钟;重启可能是用户启动的、服务更新或各种错误)。

If you can downsize that 120Gb to 5Gb (scoping only the columns and rows you need, converting to types that are smaller in size), then you should be able to run that workload.如果您可以将 120Gb 缩小到 5Gb(仅确定您需要的列和行的范围,转换为尺寸更小的类型),那么您应该能够运行该工作负载。 Sadly we don't support partitioned reference data yet.遗憾的是,我们还不支持分区参考数据。 This means that as of now, if you have to use ASA, and can't reduce those 120Gb, then you will have to deploy 1 distinct job for each subset of stream/reference data.这意味着,截至目前,如果您必须使用 ASA,并且不能减少这 120Gb,那么您将不得不为流/参考数据的每个子集部署 1 个不同的作业。

Now I'm surprised you couldn't get a 60Mb ref data to run, if you have details on what exactly went wrong, I'm happy to provide guidance.现在我很惊讶你无法运行 60Mb ref 数据,如果你有详细的错误信息,我很乐意提供指导。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Stream 分析处理事件的时间太长 - Azure Stream Analytics takes too long to process the events Azure stream 分析以逗号分隔 - Azure stream analytics split at comma Azure Stream 分析 - 如何在 Azure SQL db 中保存多个物联网设备的数据 - Azure Stream Analytics - How to save data for more than one IoT device in Azure SQL db Azure Stream 分析作业 - 转换查询 - ARM 模板中的正确格式 - Azure Stream Analytics Job - Transformation Query - correct formatting in ARM template Azure Stream 分析:如果作业查询是一天明智的 TUMBLINGWINDOW,stream 分析作业何时实际处理数据? - Azure Stream Analytics: When does a stream analytics job actually process data if the job query is a day wise TUMBLINGWINDOW? Azure 人脸 API - 如何管理非常大的数据量(超过 3000 万张人脸) - Azure Face API - How to manage very large volumes (more than 30 million faces) 如何使用azure数据工厂替换azure sql数据库中的数据? - how to replace data in azure sql database using azure data factory? 如何使用 Pyspark 提取 Azure Application Insights 事件? - How to extract Azure Application Insights Events using Pyspark? 如何使用 ARM 模板创建 Data Explorer Stream Analytics output? - How can I create a Data Explorer Stream Analytics output using ARM templates? 在 Azure 文本分析客户端中使用异步时出错 - Error on Using Async in Azure Text Analytics Client
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM