简体繁体中英

Spark - change parallelism during execution

原文 2017-01-13 21:54:41 8 1 java/ scala/ apache-spark/ hbase/ distributed-computing

I have a job divided in two parts:

The first part retrieves data from HBase using Spark
The seoncd part computes heavy CPU intensive ML algorithms

The issue is that with high number of executors/cores, the HBase cluster is too aggresively queried and this may cause production unstability. With too few executors/cores, the ML computations takes a long time to perform.

As the number of executors and cores is set at startup, I would to know if there is a way to decrease executor number for the first part of the job.

I would obviously like to avoid running two separate jobs like Hadoop would do with mandary disk serialization between these two steps.

Thanks for your help

1 answers

I guess dynamic allocation is what you are looking for. This something you can use with spark streaming as well.

I think you may have to play a little with your RDD size as well to balance data ingestion and data processing but depending on what's your real use case it can be really challenging.

Resource path change during jar execution

Can the address of variable change during execution?

How to change a function code during the execution in java

Difference between serial and parallel execution with parallelism=1

Spark Streaming parallelism with one single key

How do change file names during the execution of the program

Change content in persistence.xml dynamically during execution

log4j change CONSOLE threshold during java execution

Citrus-Framework change variable during scenario execution

How to change the logging level in log4j2 during program execution

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Resource path change during jar execution Can the address of variable change during execution? How to change a function code during the execution in java Difference between serial and parallel execution with parallelism=1 Spark Streaming parallelism with one single key How do change file names during the execution of the program Change content in persistence.xml dynamically during execution log4j change CONSOLE threshold during java execution Citrus-Framework change variable during scenario execution How to change the logging level in log4j2 during program execution

Related Tags

Spark - change parallelism during execution

Question

1 answers

solution1 0 ACCPTED 2017-01-13 22:20:06

solution1
0 ACCPTED 2017-01-13 22:20:06