简体   繁体   中英

report scheduler system design using database as master

Problem

  • we have ~50k scheduled financial reports that we periodically deliver to clients via email
  • reports have their own delivery frequency (date&time format - as configured by clients)
    • weekly
    • daily
    • hourly
    • weekdays only
    • etc.

Current architecture

  • we have a table called report_metadata that holds report information

    • report_id
    • report_name
    • report_type
    • report_details
    • next_run_time
    • last_run_time
    • etc...
  • every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.

  • Only in the master/leader instance (which is one of the 6 instances):

    • data in the timed-queue is popped at the appropriate time
    • processed
    • a few API calls are made to get a fully-complete and current/up-to-date report
    • and the report is emailed to clients
  • the other 5 instances do nothing - they simply exist for redundancy

Proposed architecture

Numbers:

  • db can handle up to 1000 concurrent connections - which is good enough
  • total existing report number (~50k) is unlikely to get much larger in the near/distant future

Solution:

  • instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
  • on average the scheduler will attempt to pick up work every 10 seconds
  • data for any single report whose next_run_time is in the past is extracted, the table row is locked , and the report is processed/delivered to clients by that specific instance
  • after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated

In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.

It would help if you could let me know if the proposed architecture is:

  • a good/correct solution
  • which table columns can/should be indexed
  • any other considerations

I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).

So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.

It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.

It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.

Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM