简体   繁体   中英

How to create a nested generated column (as part of StructType) in a delta table?

Has anyone created a nested generated column in a delta table? Something like below

schema
|- metadata: struct
|  |- id: bigint            // <-- generated column
|- data: string
|- created_at: timestamp

I know I can use DeltaTable.createOrReplace and do something like addColumn or addColumns with a spark struct. However, I am unsure how to make a nested generated column or how to indicate in a spark schema that certain column should be generated.

Anyone has any idea on how to achive this? or if it is possible at all.

Let's create a StructType .

val metadata = StructType(
  StructField("long", LongType, nullable = false) ::
  StructField("str", StringType, nullable = false) :: Nil)

Please note that the StructType uses nullable = false as it seems required. Unless the fields are nullable s, you may run into this mysterious exception:

The expression type of the generated column metadata is STRUCT<`long`: BIGINT, `str`: STRING>,
but the column type is STRUCT<`long`: BIGINT, `str`: STRING>

(Yes, that's correct. The exception is not user-friendly and is due to these nullable s being true ).

Once you've got the data type, a delta table with a generate column could be built as follows:

import org.apache.spark.sql.types._
DeltaTable.createOrReplace
  .addColumn("id", LongType, nullable = false)
  .addColumn(
    DeltaTable.columnBuilder("metadata")
      .dataType(metadata)
      .generatedAlwaysAs("struct(id AS long, 'hello' AS str)")
      .build)
  .tableName(tableName)
  .execute

The trick was to create the generation expression that matches the type (which is obvious to me just now when I finished this challenge:)).

Append some rows (not sure why INSERT does not work).

spark.range(5).writeTo(tableName).append()

And you should end up with the following table:

scala> spark.table(tableName).show
+---+----------+
| id|  metadata|
+---+----------+
|  3|{3, hello}|
|  4|{4, hello}|
|  1|{1, hello}|
|  2|{2, hello}|
|  0|{0, hello}|
+---+----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM