简体   繁体   中英

new spark.sql.shuffle.partitions value not used after checkpointing

I have a Spark's Structured Streaming application with checkpointing to write output in parquet and using the default spark.sql.shuffle.partitions = 200. I need to change the shuffle partitions but the new value is not used. Here is the content of a checkpoint offset after the application is restarted:

{"batchWatermarkMs":1520054221000,"batchTimestampMs":1520054720003,"conf":{"spark.sql.shuffle.partitions":"200"}}

Do I need to set the number of partitions in the code instead of setting it with --conf?

The number is restored from checkpoint, it will only change if you delete the checkpointed data and restart it with a "clean slate".

This makes sense, because if you have checkpointed data, Spark needs to know from how many partition directories it needs to restore the previous state.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM