简体   繁体   English

如何在数据块中的现有增量表中添加自动增量列

[英]How to add an auto increment column in an existing delta table in databricks

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).在 Databricks 中,我有一个现有的 delta 表,我想在其中再添加一列,作为 Id,以便每一行都有唯一的 id 号并且它是连续的(主键如何存在于 sql 中)。

So far I have tried converting delta table to pyspark dataframe and have added new column as到目前为止,我已经尝试将 delta 表转换为 pyspark 数据框并添加了新列作为


from pyspark.sql.window import Window as W
from pyspark.sql import functions as F
df1 = df1.withColumn("idx", F.monotonically_increasing_id())
windowSpec = W.orderBy("idx")
df1 = df1.withColumn("idx", F.row_number().over(windowSpec)).show()

I tried writing it back to delta table,我尝试将其写回增量表,

df.write.mode("append").format("delta").save(location/db.tablename)

It writes back but the data values After querying is null for the new id column.它写回,但查询后的数据值对于新的 id 列为空。 I read, overwrite mode will erase all previous data.我读过,覆盖模式将擦除所有以前的数据。 How can I bring the column id data to delta table and keep incrementing the id column when data gets inserted?如何将列 id 数据带到增量表并在插入数据时不断增加 id 列?

I am trying to achieve adding a autoincrement column for delta table.我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3. databricks 运行时为 7.3。

I am trying to achieve adding a autoincrement column for delta table.我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3 . databricks 运行时为 7.3

from official document ,来自官方文档

Identity column feature is supported for runtime version 10.4 and later not for below runtime 10.4运行时版本 10.4 及更高版本支持标识列功能,但运行时 10.4 以下版本不支持

Altering table by adding new Column with Identity is also not supported也不支持通过添加具有标识的新列来更改表

To achieve your goal first you have to migrate from runtime 7.3 to runtime 10.4 And create new table with identity column and then copy the data from first table to new table.要首先实现您的目标,您必须从runtime 7.3迁移到runtime 10.4并使用标识列创建新表,然后将数据从第一个表复制到新表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Databricks - 如何通过表路径更改现有 Delta 表的分区? - Databricks - How to change a partition of an existing Delta table via table path? 插入时在增量表中自动增加 id - Auto increment id in delta table while inserting 如何使用 pyspark 在 Databricks 中的增量实时表中为现有表创建物化视图? - how to create Materialized view for a existing table in delta live table in Databricks using pyspark? 如何使用文本文件中的列名在数据块中创建增量表的模式 - how to create schema of a delta table in databricks by using column names from text file DataBricks:在 Python 中将 CSV 数据摄取到 Delta Live Table 会触发“表名中的无效字符”错误 - 如何设置列映射模式? - DataBricks: Ingesting CSV data to a Delta Live Table in Python triggers "invalid characters in table name" error - how to set column mapping mode? Databricks - 如何获取当前版本的增量表镶木地板文件 - Databricks - How to get the current version of delta table parquet files 使用python自动增加表格列 - Auto increment a table column using python Databricks 是“更新 Delta 表的状态” - Databricks is "Updating the Delta table's state" 从 Synapse 到 Databricks 中的 Delta 表的数据类型问题? - DataType issue from Synapse to Delta table in Databricks? 使用 python 在 Databricks 中截断增量表 - Truncate delta table in Databricks using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM