简体   繁体   中英

what is spark.databricks.delta.snapshotPartitions configuration used for in delta lake?

I was going through delta lake and came across a configuration spark.databricks.delta.snapshotPartitions however not quite sure what this is used for? Can't find this in delta lake documentation as well.

In delta lake github found below code, but not sure how this property works

  val DELTA_SNAPSHOT_PARTITIONS =
    buildConf("snapshotPartitions")
      .internal()
      .doc("Number of partitions to use when building a Delta Lake snapshot.")
      .intConf
      .checkValue(n => n > 0, "Delta snapshot partition number must be positive.")
      .createOptional

Delta Lake uses Spark to process the transaction logs in the _delta_log directory. When Delta Lake loads the transaction logs, it will replay logs to generate the current state of the table which is called Snapshot . There is a repartition operation in this step. You can use spark.databricks.delta.snapshotPartitions to config how many partitions to use in the repartition operation. When your table metadata grows, you may need to increase this config so that each partition of the table metadata can be fit into the executor memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM