简体   繁体   中英

Data collection frequency strategy

I have a question and I am wondering if anyone has solved this problem effectively. I am developing a collector(let's call it A) to collect data from a source(let's call it B) which in turn collects data from somewhere else. B collects every 5 minutes, what frequency or strategy should A use? If A's frequency is double of B then it will end up with duplicate data for an interval. If it's the same as B then there's a chance that it may get stale data if the collection times are exactly the same. Has anyone solved this problem?

If there is some sort of time data associated with the data you are collecting from source B, then you could use that to exclude duplicate results; you could set it to only include new data with a more recent timestamp.

I have done this before by converting date/time to a Unix Epoch Timestamp and then checking that the latest data has a larger value, or else ignoring it. This would allow you to run your data collection at twice the rate of B's, if you desired to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM