如何在数据块中的现有增量表中添加自动增量列

Question

In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).在 Databricks 中，我有一个现有的 delta 表，我想在其中再添加一列，作为 Id，以便每一行都有唯一的 id 号并且它是连续的（主键如何存在于 sql 中）。

So far I have tried converting delta table to pyspark dataframe and have added new column as到目前为止，我已经尝试将 delta 表转换为 pyspark 数据框并添加了新列作为


from pyspark.sql.window import Window as W
from pyspark.sql import functions as F
df1 = df1.withColumn("idx", F.monotonically_increasing_id())
windowSpec = W.orderBy("idx")
df1 = df1.withColumn("idx", F.row_number().over(windowSpec)).show()

I tried writing it back to delta table,我尝试将其写回增量表，

df.write.mode("append").format("delta").save(location/db.tablename)

It writes back but the data values After querying is null for the new id column.它写回，但查询后的数据值对于新的 id 列为空。 I read, overwrite mode will erase all previous data.我读过，覆盖模式将擦除所有以前的数据。 How can I bring the column id data to delta table and keep incrementing the id column when data gets inserted?如何将列 id 数据带到增量表并在插入数据时不断增加 id 列？

I am trying to achieve adding a autoincrement column for delta table.我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3. databricks 运行时为 7.3。

Answer 1

I am trying to achieve adding a autoincrement column for delta table.我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3 . databricks 运行时为 7.3 。

from official document ,来自官方文档，

Identity column feature is supported for runtime version 10.4 and later not for below runtime 10.4运行时版本 10.4 及更高版本支持标识列功能，但运行时 10.4 以下版本不支持

Altering table by adding new Column with Identity is also not supported也不支持通过添加具有标识的新列来更改表

To achieve your goal first you have to migrate from runtime 7.3 to runtime 10.4 And create new table with identity column and then copy the data from first table to new table.要首先实现您的目标，您必须从runtime 7.3迁移到runtime 10.4并使用标识列创建新表，然后将数据从第一个表复制到新表。

如何在数据块中的现有增量表中添加自动增量列

问题描述

1 个解决方案

解决方案1
1 2022-07-13 12:33:21

如何在数据块中的现有增量表中添加自动增量列

问题描述

1 个解决方案

解决方案1 1 2022-07-13 12:33:21

解决方案1
1 2022-07-13 12:33:21