简体   繁体   中英

Poll and trigger a task in Storm Bolt

I have a bolt that loads data from a file on disk (say /blah/blah/movie_data.csv ) using a method (say loadDataFromFile() ), and stores it in memory. This file gets refreshed every day by 10 am (guaranteed to get updated by 10 am). I want my bolt to reload all the data in the file once it has been refreshed (any time soon after 10 am).

How do I implement this? Should I poll movie_data.csv and check that timestamp is more recent than the last file load time, and if so, trigger the loaddataFromFile() method? Or could this be done using some form of tick tuples?

I don't need code/pseudo-code. Just some pointers on libraries/classes which might be relevant. I already looked at the tick tuple but it does not seem to be of help here since I want to refresh the file at a particular time, not every T seconds.

Is there a way for the bolt to receive a tick tuple at a scheduled time (say 10 am daily)? That way, I won't have to use polling to check if the file has been refreshed.

That's depends if you using a Trident topology or not. If it's a Trident topology, you can implement a storm.trident.spout.BatchSpoutExecutor and a BatchSpoutExecutor.BatchSpoutEmitter. As I recall, check the Trident examples as they implements theses interfaces.

Hope that this will help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM