简体   繁体   中英

Apache Spark: Reduce number of cores during an execution

Is there a way to reduce the number of cores / executors during a certain part of the run. We don't want to overrun the end datastore, but need more cores to do computational work effectively.

Basically

// want n cores here
val eventJsonRdd: RDD[(String,(Event, Option[Article]))] = eventGeoRdd.leftOuterJoin(articlesRdd)

val toSave =  eventJsonRdd.map(processEventsAndArticlesJson)

// want two cores here
toSave.saveToEs("apollobit/events")

You can try:

toSave.repartition(2).saveTo...

Although this will entail a potentially expensive shuffle.

If your store supports bulk updates, you will get way better performance by calling foreachPartition and doing something with a chunk of data rather than one at a time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM