简体   繁体   中英

Databricks is "Updating the Delta table's state"

I'm reading and joining multiple delta tables from a Datalake and store the result back to another Deltalake location. When doing so, Databricks is showing me : 在此处输入图片说明

Depending on how many delta tables I join with each other, this can take up to very long time. Even tough the joining itself would just take up to a few minutes, the state update takes up to an hour.

What is happening when I see Updating the Delta table's state ? Can I somehow optimize this?

Thank you Karthikeyan Rasipalay Durairaj , Posting your suggestion as an answer to help other community members.

Updating the Delta table's state.

The command status report means ,

  • At the beginning of each query Delta tables auto-update to the latest version of the table.
  • Delta Lake writes checkpoints as an aggregate state of a Delta table at an optimized frequency.
  • Databricks optimizes the performance of higher-order functions and DataFrame operations using nested types.
  • For Delta Lake on Databricks SQL optimization command reference information, refer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM