简体   繁体   中英

can we start a new thread in foreachRDD in spark streaming?

I want to start a child-thread in foreachRDD.

My situation is:

the job is reading from a hdfs dir continuously, and every 100 batches, I want to launch a model training task (I will make a snapshot of the rdds at that time and start the training task. the training task takes a very long time(2 hours), and I don't want the training task influence reading new batch of data.

Is starting a new child thread a good solution? Could the child thread use SparkContext in the main thread and use the rdd in main thread?

You don't need to start a new thread in RDD operations. To start a new job in every a hundred batch, you can add a BatchListner to count number of batches, and start a new job when number equals to 100. BatchListener Example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM