Is there a way to reduce the number of cores / executors during a certain part of the run. We don't want to overrun the end datastore, but need more cores to do computational work effectively.
Basically
// want n cores here
val eventJsonRdd: RDD[(String,(Event, Option[Article]))] = eventGeoRdd.leftOuterJoin(articlesRdd)
val toSave = eventJsonRdd.map(processEventsAndArticlesJson)
// want two cores here
toSave.saveToEs("apollobit/events")
You can try:
toSave.repartition(2).saveTo...
Although this will entail a potentially expensive shuffle.
If your store supports bulk updates, you will get way better performance by calling foreachPartition
and doing something with a chunk of data rather than one at a time.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.