简体   繁体   English

Databricks 是“更新 Delta 表的状态”

[英]Databricks is "Updating the Delta table's state"

I'm reading and joining multiple delta tables from a Datalake and store the result back to another Deltalake location.我正在从 Datalake 读取并加入多个增量表,并将结果存储回另一个 Deltalake 位置。 When doing so, Databricks is showing me :这样做时,Databricks 向我展示: 在此处输入图片说明

Depending on how many delta tables I join with each other, this can take up to very long time.根据我相互连接的增量表的数量,这可能需要很长时间。 Even tough the joining itself would just take up to a few minutes, the state update takes up to an hour.即使加入本身也只需要几分钟,状态更新需要长达一个小时。

What is happening when I see Updating the Delta table's state ?当我看到Updating the Delta table's state什么? Can I somehow optimize this?我可以以某种方式优化它吗?

Thank you Karthikeyan Rasipalay Durairaj , Posting your suggestion as an answer to help other community members.谢谢Karthikeyan Rasipalay Durairaj ,发布您的建议作为帮助其他社区成员的答案。

Updating the Delta table's state.更新 Delta 表的状态。

The command status report means ,命令状态报告意味着,

  • At the beginning of each query Delta tables auto-update to the latest version of the table.在每次查询开始时,Delta 表会自动更新到最新版本的表。
  • Delta Lake writes checkpoints as an aggregate state of a Delta table at an optimized frequency. Delta Lake 以优化的频率将检查点作为 Delta 表的聚合状态写入。
  • Databricks optimizes the performance of higher-order functions and DataFrame operations using nested types. Databricks 使用嵌套类型优化高阶函数和 DataFrame 操作的性能。
  • For Delta Lake on Databricks SQL optimization command reference information, refer有关 Delta Lake on Databricks SQL 优化命令参考信息,请参阅

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 python 在 Databricks 中截断增量表 - Truncate delta table in Databricks using python Python Azure Databricks 创建增量表异常:不存在事务日志 - Python Azure Databricks create delta table exception: no transaction log present 模块“dlt”没有属性“表”-databricks 和增量实时表 - Module 'dlt' has no attribute 'table' - databricks and delta live tables 将 delta lake 写入 AWS S3(没有 Databricks) - Writing delta lake to AWS S3 (Without Databricks) 遍历数据块仓库中的表并使用 pyspark 将某些值提取到另一个增量表中 - loop through tables in databricks warehouse and extract certain values into another delta table with pyspark 如何使用文本文件中的列名在数据块中创建增量表的模式 - how to create schema of a delta table in databricks by using column names from text file 在数据块中创建具有当前日期的版本副本后,将增量表恢复到以前的版本 - Restore delta table to previous version after creating a copy of version with current date in databricks 将存储帐户 Azure 转换为 Databricks 增量表 - Convert storage account Azure into Databricks delta tables Databricks 未在 SQL 查询中更新 - Databricks not updating in SQL query 如何在 Databricks Delta Live 表上使用 Apache Sedona? - How to use Apache Sedona on Databricks Delta Live tables?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM