简体   繁体   中英

How to run jobs in parallel manner in pyspark?

I am trying to run jobs in a parallel manner. Can you please help me how to do this?

Example:

Job       Job_Type
A         independent
B         independent
C         A
D         B

You can see here Job A, B are independent so they will run in a same time. C and D dependent on A and B. So they will run after completion of respective Jobs. Suppose A is taking 10 min. and B is taking 15 min. So After completion of A immediately C should start.

Can we create logic for this scenario? Please let me know if you need more information.

I am not sure what orchestration tool you are using, but you can create a job something like below..Or this is what I follow..

Create a Rule based job as such: C will update when A will have new data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM