简体   繁体   中英

Dataproc - SparkSession.builder.appName not reflected on dataproc properties

I am running a PySpark application using Dataproc Serverless for Spark, and my config file looks like this:

spark = (
    pyspark.sql.SparkSession.builder.appName("app_name")
        .config("spark.logConf", "true")
        .config("spark.sql.broadcastTimeout", broadcast_timeout)
        .config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0")
        .config("spark.ui.showConsoleProgress", progress_bar)
        .getOrCreate()
    )

But the appName used is not reflected in the Dataproc batch job console:

In Dataproc -> Batches -> Clicking on Job Id -> Details tab -> Properties: spark:spark.app.name gives me a random ID.

Dataproc UI reflect properties set during batch submission, and do not reflect all properties that are set in Spark application code. spark.app.name property value that you see is a default value for this property, that you override in your Spark app.

If you can, you need to set this property when submitting batch job:

gcloud dataproc batches submit \
  . . . \
  --properties=spark.app.name="<MY_CUSTOM_APP_NAME>"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM