繁体   English   中英

如何在火花 scala 中的结构数组的情况下更新列值

[英]How to update column value in case of array of struct in spark scala

root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- Animal: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- Elephant: string (nullable = false)
 |    |    |-- Lion: string (nullable = true)
 |    |    |-- Zebra: string (nullable = true)
 |    |    |-- Dog: string (nullable = true)

如果我有一个我不想更新的列列表,我只想将结构数组更新为某个值。 例如,如果我有一个列表 List[String] = List(Zebra,Dog) 这是否可以将所有其他列数组设置为 0 像 Elephant 和 Lion 将是 0

+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[1, 1, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 1, 1, 0]]|
+---+----+-----+------+-------+--------------------+
After operations It will be
+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[0, 0, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 0, 1, 0]]|
+---+----+-----+------+-------+--------------------+

我逐行迭代就像我做了一个 function

def changeValue(row :Row) = {
//some code
}

但不能这样做

检查下面的代码。

scala> ddf.show(false)
+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[1, 11, 111, 1111]]|
|fb1|fb11|fb111|fb1111|fb11111|[[2, 22, 222, 2222]]|
+---+----+-----+------+-------+--------------------+


scala> val columnsTobeUpdatedInWebhooks = Seq("zebra","dog") // Columns to be updated in webhooks.
columnsTobeUpdatedInWebhooks: Seq[String] = List(zebra, dog)

构造表达式

val expr = flatten(
    array(
        ddf
        .select(explode($"webhooks").as("webhooks"))
        .select("webhooks.*")
        .columns
        .map(c => if(columnsTobeUpdatedInWebhooks.contains(c)) col(s"webhooks.${c}").as(c) else array(lit(0)).as(c)):_*
    )
)

expr: org.apache.spark.sql.Column = flatten(array(array(0) AS `elephant`, array(0) AS `lion`, webhooks.zebra AS `zebra`, webhooks.dog AS `dog`))

应用表达式

scala> ddf.withColumn("webhooks",struct(expr)).show(false)
+---+----+-----+------+-------+--------------+
|_id|h   |inc  |op    |ts     |webhooks      |
+---+----+-----+------+-------+--------------+
|fa1|fa11|fa111|fa1111|fa11111|[[0, 0, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 0, 1, 0]]|
+---+----+-----+------+-------+--------------+

最终模式

scala> ddf.withColumn("webhooks",allwebhookColumns).printSchema
root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- webhooks: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- elephant: integer (nullable = false)
 |    |    |-- lion: integer (nullable = false)
 |    |    |-- zebra: integer (nullable = false)
 |    |    |-- dog: integer (nullable = false)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM