繁体   English   中英

将具有文字值的新列添加到 Spark Scala 中 Dataframe 中的结构列

[英]Add new column with literal value to a struct column in Dataframe in Spark Scala

我有一个 dataframe 具有以下架构:

root
 |-- docnumber: string (nullable = true)
 |-- event: struct (nullable = false)
 |    |-- data: struct (nullable = true)
           |-- codevent: int (nullable = true)

我需要在event.data中添加一列,以便架构如下所示:

root
 |-- docnumber: string (nullable = true)
 |-- event: struct (nullable = false)
 |    |-- data: struct (nullable = true)
           |-- codevent: int (nullable = true)
           |-- needtoaddit: int (nullable = true)

我试过了

  • dataframe.withColumn("event.data.needtoaddit", lit("added"))

    但它添加了一个名为event.data.needtoaddit的列

  • dataframe.withColumn( "event", struct( $"event.*", struct( lit("added").as("needtoaddit") ).as("data") ) )

    但它创建了一个名为event.data的模棱两可的列,我又遇到了问题。

我怎样才能让它工作?

你有点接近。 试试这个代码:

val df2 = df.withColumn(
    "event", 
    struct(
        struct(
            $"event.data.*", 
            lit("added").as("needtoaddit")
        ).as("data")
    )
)

火花 3.1+

要在结构列中添加字段,请使用withField

col("event.data").withField("needtoaddit", lit("added"))

输入:

val df = spark.createDataFrame(Seq(("1", 2)))
    .select(
        col("_1").as("docnumber"),
        struct(struct(col("_2").as("codevent")).as("data")).as("event")
    )
df.printSchema()
// root
//  |-- docnumber: string (nullable = true)
//  |-- event: struct (nullable = false)
//  |    |-- data: struct (nullable = false)
//  |    |    |-- codevent: long (nullable = true)

脚本:

val df2 = df.withColumn(
    "event",
    col("event.data").withField("needtoaddit", lit("added"))
)

df2.printSchema()
// root
//  |-- docnumber: string (nullable = true)
//  |-- event: struct (nullable = false)
//  |    |-- data: struct (nullable = true)
//            |-- codevent: int (nullable = true)
//            |-- needtoaddit: int (nullable = true)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM