what is spark.databricks.delta.snapshotPartitions configuration used for in delta lake?

Question

I was going through delta lake and came across a configuration spark.databricks.delta.snapshotPartitions however not quite sure what this is used for? Can't find this in delta lake documentation as well.

In delta lake github found below code, but not sure how this property works

  val DELTA_SNAPSHOT_PARTITIONS =
    buildConf("snapshotPartitions")
      .internal()
      .doc("Number of partitions to use when building a Delta Lake snapshot.")
      .intConf
      .checkValue(n => n > 0, "Delta snapshot partition number must be positive.")
      .createOptional

Answer 1

Delta Lake uses Spark to process the transaction logs in the _delta_log directory. When Delta Lake loads the transaction logs, it will replay logs to generate the current state of the table which is called Snapshot . There is a repartition operation in this step. You can use spark.databricks.delta.snapshotPartitions to config how many partitions to use in the repartition operation. When your table metadata grows, you may need to increase this config so that each partition of the table metadata can be fit into the executor memory.

what is spark.databricks.delta.snapshotPartitions configuration used for in delta lake?

Question

1 answers

solution1
0 ACCPTED 2020-05-07 01:41:45

what is spark.databricks.delta.snapshotPartitions configuration used for in delta lake?

Question

1 answers

solution1 0 ACCPTED 2020-05-07 01:41:45

solution1
0 ACCPTED 2020-05-07 01:41:45