[英]How to peek recent data from Azure Event Hub? .NET
I have a problem about how to peek recent data from Event Hub by .NET application. 我有一个关于如何通过.NET应用程序从Event Hub查看最新数据的问题。 Our demand is to continuously read/get the last 1-day data in Event Hub, every hour.
我们的要求是每小时不断读取/获取Event Hub中的最后1天数据。 For example, at 6 o'clock, I want to get the data from 6 o'clock yesterday to 6 o'clock today.
例如,在6点钟,我想从昨天的6点到今天的6点获取数据。 Then, at 7 o'clock, I want to get the data from 7 o'clock yesterday to 7 o'clock today.
然后,在7点钟,我想从昨天的7点到今天的7点得到数据。
I have tried to receive events from Azure Event Hubs, following the tutorial . 我按照本教程尝试从Azure事件中心接收事件。 But it doesn't satisify my demand.
但它并不能满足我的要求。 My comprehension to this receiving events process is that, every time there is a new event coming to Event Hub, a signal will be set up and
EventProcessorHost class
will be trigged to get the event data. 我对这个接收事件过程的理解是,每当有一个新事件进入事件中心时,将建立一个信号并触发
EventProcessorHost class
以获取事件数据。 ( I doubt whether my comprehension is right. ) However, in this method, a event data can only be accessed once. ( 我怀疑我的理解是否正确。 )但是,在这种方法中,事件数据只能被访问一次。 It cannot be accessed in the next receiving operations because it disappers in Event Hub.
它无法在下一个接收操作中访问,因为它在事件中心中消失。
Is there a method achieving the above demand? 有没有一种方法可以达到上述要求?
Meanwhile, I also want to know how to use " offset " in receiving events process. 同时,我也想知道如何在接收事件过程中使用“ offset ”。 I know its concept, but it is better there is a demo of how to use it.
我知道它的概念,但更好的是有一个如何使用它的演示。
I will be appriciate if you have any advice. 如果您有任何建议,我会很高兴。 :)
:)
This is not a scenario where event hub is designed for. 这不是设计事件中心的场景。 It is designed to handle incoming data at scale and the typical use case is to process that data as fast as possible.
它旨在大规模处理传入的数据,典型的用例是尽可能快地处理数据。
In your case you want to process largely the same data over and over again, with an hour intervals. 在您的情况下,您希望一次又一次地处理大致相同的数据,每隔一小时。 The
EventProcessrHost
is meant to be running continuously in a background process. EventProcessrHost
旨在在后台进程中持续运行。
It might be a lot easier to store incoming data to blob storage in a format that includes the hour component like container\\date\\time\\blob1.json (\\container\\2019-12-22\\07\\blob1.json) and have, for example, a schedule triggered azure function that than knows what blobs to process based on the time it is triggered. 以包含小时组件的格式将传入数据存储到blob存储可能要容易得多,例如container \\ date \\ time \\ blob1.json(\\ container \\ 2019-12-22 \\ 07 \\ blob1.json)并且具有,例如,计划触发了azure函数 ,而不是根据触发的时间知道要处理的blob。
My comprehension to this receiving events process is that, every time there is a new event coming to Event Hub, a signal will be set up and EventProcessorHost class will be trigged to get the event data.
我对这个接收事件过程的理解是,每当有一个新事件进入事件中心时,将建立一个信号并触发EventProcessorHost类以获取事件数据。 (I doubt whether my comprehension is right.) However, in this method, a event data can only be accessed once.
(我怀疑我的理解是否正确。)但是,在这种方法中,事件数据只能被访问一次。 It cannot be accessed in the next receiving operations because it disappers in Event Hub.
它无法在下一个接收操作中访问,因为它在事件中心中消失。
That is correct, the EventProcessorHost will run on new events. 这是正确的,EventProcessorHost将在新事件上运行。 Events are processed once using the concept of checkpointing.
使用检查点的概念处理事件一次。 You can play with offsets to rewind the stream but I think my alternative is much easier.
您可以使用偏移来回放流,但我认为我的替代方案更容易。
Another technology that might be useful to you and plays well with event hub is Azure Stream Analytics , it allows you to define time based windows but 24 hrs might be a bit to much for this, not sure about that. 另一项可能对您有用且与事件中心兼容的技术是Azure流分析 ,它允许您定义基于时间的窗口,但24小时可能对此有点太多,不确定。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.