简体   繁体   中英

predict ETA of a job which depends on other jobs

I want to predict ETA of a scheduled job, which is dependent on other jobs completion. there are many such jobs exist.

for example

在此处输入图片说明

now I want to get the ETA of job D, for which C,B,A and PA are preceding jobs. D does not trigger until A,B,C and PA completes. here C also did not start on time, so both C and D are delayed.

In short to predict ETA of job "D", my model should look at the A,B,C and PA jobs completion then predict. if any of the previous jobs did not start it should auto find the ETA of them and predict the ETA of D.

any help appreciated.

This is a classic graph problem . People tend to confuse this with ML problem. It is NOT. Happened in my org as well. Regression etc is wrong logic for this. I had the same usecase and was able to develop the code for this.

1. Data representation:
As given in the OP's image, the data needs to be represented in vertical data format. For understanding: https://www.developer.com/db/article.php/3736011/using-vertical-and-horizontal-table-structures-in-oracle.htm

2. Code Logic:
You will have to create two functions mainly: a) create_job_dependency b) calculate_eta
Loop through your data from top and until you find the last job / end job on which there is no further dependency, keep calling function a). ie keep creating job_dependency.. It will be a simple dictionary with key as a jobname and value as a list of parent jobs. eg A is dependent on B and C then you should have this data structure {A : [B,C]}

Once you reach the last job, call the calculate_eta function. This will be basically a DFS (Depth first search) implementation. Keep recursively calling the job_dependency until you find a job which is successful and return ETA from there. Every recursive call will return the output of ETAs. Store that in a list for any one particular node. Take the max value of ETA for that node and add its average run time. Eg A (referred to as node) is dependent on B & C. B is in RUNNING state and C is in SUCCESS state. When you do DFS, it will go to B and see it is in running. It will return Start time of B + Avg Run Time of B. Then it will go to C and return its Last Success time. Moreover, you also have the expected start time of A as well. Take the max of these values and add run time of A and return. Keep doing this recursively and you will have the ETAs of all pending jobs till the last SLAEndJob.

Unfortunately, as of now, I wont be able to share the code as I work in fintech. However, it is quite a beautiful problem and in couple of days, you should be able to code out the logic I shared.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM