Dataproc - SparkSession.builder.appName not reflected on dataproc properties

Question

I am running a PySpark application using Dataproc Serverless for Spark, and my config file looks like this:

spark = (
    pyspark.sql.SparkSession.builder.appName("app_name")
        .config("spark.logConf", "true")
        .config("spark.sql.broadcastTimeout", broadcast_timeout)
        .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0")
        .config("spark.ui.showConsoleProgress", progress_bar)
        .getOrCreate()
    )

But the appName used is not reflected in the Dataproc batch job console:

In Dataproc -> Batches -> Clicking on Job Id -> Details tab -> Properties: spark:spark.app.name gives me a random ID.

Answer 1

Dataproc UI reflect properties set during batch submission, and do not reflect all properties that are set in Spark application code. spark.app.name property value that you see is a default value for this property, that you override in your Spark app.

If you can, you need to set this property when submitting batch job:

gcloud dataproc batches submit \
  . . . \
  --properties=spark.app.name="<MY_CUSTOM_APP_NAME>"

Dataproc - SparkSession.builder.appName not reflected on dataproc properties

Question

1 answers

solution1
1 ACCPTED 2022-12-08 05:02:26

Dataproc - SparkSession.builder.appName not reflected on dataproc properties

Question

1 answers

solution1 1 ACCPTED 2022-12-08 05:02:26

solution1
1 ACCPTED 2022-12-08 05:02:26