[英]Poll and trigger a task in Storm Bolt
I have a bolt that loads data from a file on disk (say /blah/blah/movie_data.csv
) using a method (say loadDataFromFile()
), and stores it in memory. 我有一个螺栓,它使用一种方法(例如
loadDataFromFile()
)从磁盘上的文件(例如/blah/blah/movie_data.csv
)中加载数据,并将其存储在内存中。 This file gets refreshed every day by 10 am (guaranteed to get updated by 10 am). 每天上午10点刷新此文件(保证在上午10点更新)。 I want my bolt to reload all the data in the file once it has been refreshed (any time soon after 10 am).
我希望我的Bolt刷新文件后(在上午10点后的任何时间)重新加载文件中的所有数据。
How do I implement this? 我该如何实施? Should I poll
movie_data.csv
and check that timestamp is more recent than the last file load time, and if so, trigger the loaddataFromFile()
method? 我应该轮询
movie_data.csv
并检查时间戳是否比上次文件加载时间更新,如果是,则触发loaddataFromFile()
方法? Or could this be done using some form of tick
tuples? 还是可以使用某种形式的
tick
元组来完成?
I don't need code/pseudo-code. 我不需要代码/伪代码。 Just some pointers on libraries/classes which might be relevant.
只是一些与库/类相关的指针。 I already looked at the
tick
tuple but it does not seem to be of help here since I want to refresh the file at a particular time, not every T seconds. 我已经看过
tick
元组,但是这里似乎没有帮助,因为我想在特定时间而不是每T秒刷新一次文件。
Is there a way for the bolt to receive a tick
tuple at a scheduled time (say 10 am daily)? 有没有一种方法可以让螺栓在预定时间(例如每天上午10点)接收
tick
元组? That way, I won't have to use polling to check if the file has been refreshed. 这样,我将不必使用轮询来检查文件是否已刷新。
That's depends if you using a Trident topology or not. 这取决于您是否使用Trident拓扑。 If it's a Trident topology, you can implement a
storm.trident.spout.BatchSpoutExecutor
and a BatchSpoutExecutor.BatchSpoutEmitter.
如果它是三叉戟拓扑,则可以实现
storm.trident.spout.BatchSpoutExecutor
和BatchSpoutExecutor.BatchSpoutEmitter.
As I recall, check the Trident examples as they implements theses interfaces. 我记得,在实现这些接口的情况下,请检查Trident示例。
Hope that this will help. 希望这会有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.