简体   繁体   中英

Where do I configure spark executors and executor memory of a spark job in a dataproc cluster?

I am new to GCP and been asked to work on dataproc to create spark application to bring data from source database to Bigquery on GCP. I created a dataproc cluster with the following options:

gcloud dataproc clusters create testcluster \
--enable-component-gateway --bucket <bucket_name> \
--region <region> \
--subnet <subnet_name> \
--no-address \
--zone <zone> \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 500 \
--metadata 'PIP_PACKAGES=pyspark==2.4.0' \
--initialization-actions <some_script.sh> \
--image-version 1.5-debian10 \
--project <project_name> \
--service-account=<account_name> \
--properties spark:spark.jars=<jar_path_of_source_db_in_bucket>,dataproc:dataproc.conscrypt.provider.enable=false \
--optional-components ANACONDA,JUPYTER

I am submitting a spark job in the below way: 在此处输入图像描述

What I don't understand is how do I specify the number of executors and executor memory? Could anyone let me know where and how can I specify the parameters --num-execuors & executor-memory to my spark job?

You can pass them via the --properties option:

--properties=[PROPERTY=VALUE,…] List of key value pairs to configure Spark. For a list of available properties, see: https://spark.apache.org/docs/latest/configuration.html#available-properties .

Example using gcloud command:

gcloud dataproc jobs submit pyspark path_main.py --cluster=$CLUSTER_NAME \
--region=$REGION \
--properties="spark.submit.deployMode"="cluster",\
"spark.dynamicAllocation.enabled"="true",\
"spark.shuffle.service.enabled"="true",\
"spark.executor.memory"="15g",\
"spark.driver.memory"="16g",\
"spark.executor.cores"="5"

Or if you prefer to do it via the UI in the Properties section by clicking on ADD PROPERTY button:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM