简体繁体中英

report scheduler system design using database as master

原文 2021-12-13 12:10:54 7 2 sql/ db2/ distributed-system/ taskscheduler/ database-indexes

Problem

we have ~50k scheduled financial reports that we periodically deliver to clients via email
reports have their own delivery frequency (date&time format - as configured by clients)
- weekly
- daily
- hourly
- weekdays only
- etc.

Current architecture

we have a table called report_metadata that holds report information
- report_id
- report_name
- report_type
- report_details
- next_run_time
- last_run_time
- etc...
every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.
Only in the master/leader instance (which is one of the 6 instances):
- data in the timed-queue is popped at the appropriate time
- processed
- a few API calls are made to get a fully-complete and current/up-to-date report
- and the report is emailed to clients
the other 5 instances do nothing - they simply exist for redundancy

Proposed architecture

Numbers:

db can handle up to 1000 concurrent connections - which is good enough
total existing report number (~50k) is unlikely to get much larger in the near/distant future

Solution:

instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
on average the scheduler will attempt to pick up work every 10 seconds
data for any single report whose next_run_time is in the past is extracted, the table row is locked , and the report is processed/delivered to clients by that specific instance
after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated

In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.

It would help if you could let me know if the proposed architecture is:

a good/correct solution
which table columns can/should be indexed
any other considerations

2 answers

I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).

So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.

It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.

It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.

Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

Efficiently design Database system using design pattern

database design for sale system

Database Design for a survey system

Database design for a rating system

sql database design for conference room scheduler

Database design for email messaging system

How to design database for blog system?

Database Design for Transport timetable system

Flexible database design for an inventory system

Database Design for Billing System application

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Efficiently design Database system using design pattern database design for sale system Database Design for a survey system Database design for a rating system sql database design for conference room scheduler Database design for email messaging system How to design database for blog system? Database Design for Transport timetable system Flexible database design for an inventory system Database Design for Billing System application

Related Tags

report scheduler system design using database as master

Question

2 answers

solution1
2 2021-12-13 12:32:04

solution2
0 2021-12-26 00:09:20

report scheduler system design using database as master

Question

2 answers

solution1 2 2021-12-13 12:32:04

solution2 0 2021-12-26 00:09:20

solution1
2 2021-12-13 12:32:04

solution2
0 2021-12-26 00:09:20