[英]Add a new Column in Spark DataFrame which contains the sum of all values of one column-Scala/Spark
[英]Add new column with literal value to a struct column in Dataframe in Spark Scala
我有一个 dataframe 具有以下架构:
root
|-- docnumber: string (nullable = true)
|-- event: struct (nullable = false)
| |-- data: struct (nullable = true)
|-- codevent: int (nullable = true)
我需要在event.data
中添加一列,以便架构如下所示:
root
|-- docnumber: string (nullable = true)
|-- event: struct (nullable = false)
| |-- data: struct (nullable = true)
|-- codevent: int (nullable = true)
|-- needtoaddit: int (nullable = true)
我试过了
dataframe.withColumn("event.data.needtoaddit", lit("added"))
但它添加了一个名为event.data.needtoaddit
的列
dataframe.withColumn( "event", struct( $"event.*", struct( lit("added").as("needtoaddit") ).as("data") ) )
但它创建了一个名为event.data
的模棱两可的列,我又遇到了问题。
我怎样才能让它工作?
你有点接近。 试试这个代码:
val df2 = df.withColumn(
"event",
struct(
struct(
$"event.data.*",
lit("added").as("needtoaddit")
).as("data")
)
)
火花 3.1+
要在结构列中添加字段,请使用withField
col("event.data").withField("needtoaddit", lit("added"))
输入:
val df = spark.createDataFrame(Seq(("1", 2)))
.select(
col("_1").as("docnumber"),
struct(struct(col("_2").as("codevent")).as("data")).as("event")
)
df.printSchema()
// root
// |-- docnumber: string (nullable = true)
// |-- event: struct (nullable = false)
// | |-- data: struct (nullable = false)
// | | |-- codevent: long (nullable = true)
脚本:
val df2 = df.withColumn(
"event",
col("event.data").withField("needtoaddit", lit("added"))
)
df2.printSchema()
// root
// |-- docnumber: string (nullable = true)
// |-- event: struct (nullable = false)
// | |-- data: struct (nullable = true)
// |-- codevent: int (nullable = true)
// |-- needtoaddit: int (nullable = true)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.