简体   繁体   English

在Storm Bolt中轮询并触发任务

[英]Poll and trigger a task in Storm Bolt

I have a bolt that loads data from a file on disk (say /blah/blah/movie_data.csv ) using a method (say loadDataFromFile() ), and stores it in memory. 我有一个螺栓,它使用一种方法(例如loadDataFromFile() )从磁盘上的文件(例如/blah/blah/movie_data.csv )中加载数据,并将其存储在内存中。 This file gets refreshed every day by 10 am (guaranteed to get updated by 10 am). 每天上午10点刷新此文件(保证在上午10点更新)。 I want my bolt to reload all the data in the file once it has been refreshed (any time soon after 10 am). 我希望我的Bolt刷新文件后(在上午10点后的任何时间)重新加载文件中的所有数据。

How do I implement this? 我该如何实施? Should I poll movie_data.csv and check that timestamp is more recent than the last file load time, and if so, trigger the loaddataFromFile() method? 我应该轮询movie_data.csv并检查时间戳是否比上次文件加载时间更新,如果是,则触发loaddataFromFile()方法? Or could this be done using some form of tick tuples? 还是可以使用某种形式的tick元组来完成?

I don't need code/pseudo-code. 我不需要代码/伪代码。 Just some pointers on libraries/classes which might be relevant. 只是一些与库/类相关的指针。 I already looked at the tick tuple but it does not seem to be of help here since I want to refresh the file at a particular time, not every T seconds. 我已经看过tick元组,但是这里似乎没有帮助,因为我想在特定时间而不是每T秒刷新一次文件。

Is there a way for the bolt to receive a tick tuple at a scheduled time (say 10 am daily)? 有没有一种方法可以让螺栓在预定时间(例如每天上午10点)接收tick元组? That way, I won't have to use polling to check if the file has been refreshed. 这样,我将不必使用轮询来检查文件是否已刷新。

That's depends if you using a Trident topology or not. 这取决于您是否使用Trident拓扑。 If it's a Trident topology, you can implement a storm.trident.spout.BatchSpoutExecutor and a BatchSpoutExecutor.BatchSpoutEmitter. 如果它是三叉戟拓扑,则可以实现storm.trident.spout.BatchSpoutExecutorBatchSpoutExecutor.BatchSpoutEmitter. As I recall, check the Trident examples as they implements theses interfaces. 我记得,在实现这些接口的情况下,请检查Trident示例。

Hope that this will help. 希望这会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM