I am running a PySpark application using Dataproc Serverless for Spark, and my config file looks like this:
spark = (
pyspark.sql.SparkSession.builder.appName("app_name")
.config("spark.logConf", "true")
.config("spark.sql.broadcastTimeout", broadcast_timeout)
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0")
.config("spark.ui.showConsoleProgress", progress_bar)
.getOrCreate()
)
But the appName
used is not reflected in the Dataproc batch job console:
In Dataproc -> Batches -> Clicking on Job Id -> Details tab -> Properties: spark:spark.app.name
gives me a random ID.
Dataproc UI reflect properties set during batch submission, and do not reflect all properties that are set in Spark application code. spark.app.name
property value that you see is a default value for this property, that you override in your Spark app.
If you can, you need to set this property when submitting batch job:
gcloud dataproc batches submit \
. . . \
--properties=spark.app.name="<MY_CUSTOM_APP_NAME>"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.