[英]How to add an auto increment column in an existing delta table in databricks
In Databricks I have a existing delta table, In which i want to add one more column, as Id so that each row has unique id no and It is consecutive (how primary key is present in sql).在 Databricks 中,我有一个现有的 delta 表,我想在其中再添加一列,作为 Id,以便每一行都有唯一的 id 号并且它是连续的(主键如何存在于 sql 中)。
So far I have tried converting delta table to pyspark dataframe and have added new column as到目前为止,我已经尝试将 delta 表转换为 pyspark 数据框并添加了新列作为
from pyspark.sql.window import Window as W
from pyspark.sql import functions as F
df1 = df1.withColumn("idx", F.monotonically_increasing_id())
windowSpec = W.orderBy("idx")
df1 = df1.withColumn("idx", F.row_number().over(windowSpec)).show()
I tried writing it back to delta table,我尝试将其写回增量表,
df.write.mode("append").format("delta").save(location/db.tablename)
It writes back but the data values After querying is null for the new id column.它写回,但查询后的数据值对于新的 id 列为空。 I read, overwrite mode will erase all previous data.
我读过,覆盖模式将擦除所有以前的数据。 How can I bring the column id data to delta table and keep incrementing the id column when data gets inserted?
如何将列 id 数据带到增量表并在插入数据时不断增加 id 列?
I am trying to achieve adding a autoincrement column for delta table.我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3.
databricks 运行时为 7.3。
I am trying to achieve adding a autoincrement column for delta table.
我正在尝试为增量表添加自动增量列。 The databricks runtime is 7.3 .
databricks 运行时为 7.3 。
from official document ,来自官方文档,
Identity column feature is supported for runtime version 10.4 and later not for below runtime 10.4运行时版本 10.4 及更高版本支持标识列功能,但运行时 10.4 以下版本不支持
Altering table by adding new Column with Identity is also not supported也不支持通过添加具有标识的新列来更改表
To achieve your goal first you have to migrate from runtime 7.3
to runtime 10.4
And create new table with identity column and then copy the data from first table to new table.要首先实现您的目标,您必须从
runtime 7.3
迁移到runtime 10.4
并使用标识列创建新表,然后将数据从第一个表复制到新表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.