[英]Loading nested array into spark dataframe column
I have a nested array which looks like我有一个嵌套数组,看起来像
a = [[1,2],[2,3]]
i have a streaming dataframe which looks like我有一个流媒体 dataframe 看起来像
|system |level|
+----------+-----+
|Test1 |1 |
|Test2 |3 |
I want to include the array into third column as a nested array.我想将数组作为嵌套数组包含在第三列中。
|system |level| Data |
+----------+-----+------+
|Test1 |1 |[[1,2],[2,3]]
I tried with column and array function.我尝试使用列和数组 function。 But i am not sure how to use nested array.
但我不确定如何使用嵌套数组。
Any help would be appreciated.任何帮助,将不胜感激。
You can add a new column, but you'll have to use a crossJoin
:您可以添加一个新列,但您必须使用
crossJoin
:
a = [[1,2],[2,3]]
df.crossJoin(spark.createDataFrame([a], "array<array<bigint>>")).show()
+-------------------+----+------+----------------+
| date|hour| value| data|
+-------------------+----+------+----------------+
|1984-01-01 00:00:00| 1|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00| 2|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00| 3|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00| 4|638.55|[[1, 2], [2, 3]]|
|1984-01-01 00:00:00| 5|638.55|[[1, 2], [2, 3]]|
+-------------------+----+------+----------------+
In scala API, we can use "typedLit" function to add the Array or map values in the column.在 scala API 中,我们可以使用 "typedLit" function 来添加数组或 Z1D7AEZ8DC58ED5124FE49151 列中的值。
// Ref: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ // Ref: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
Here is the sample code to add an Array as a column value.这是将数组添加为列值的示例代码。
import org.apache.spark.sql.functions.typedLit
val a = Seq((1,2),(2,3))
val df1 = Seq(("Test1", 1), ("Test3", 3)).toDF("a", "b")
df1.withColumn("new_col", typedLit(a)).show()
// Output // Output
+-----+---+----------------+
| a| b| new_col|
+-----+---+----------------+
|Test1| 1|[[1, 2], [2, 3]]|
|Test3| 3|[[1, 2], [2, 3]]|
+-----+---+----------------+
I hope this helps.我希望这有帮助。
If you want to add the same array to all raws then you can use the TypedLit
from the sql functions.如果要将相同的数组添加到所有原始数据,则可以使用
TypedLit
函数中的 TypedLit。 See this answer:看到这个答案:
https://stackoverflow.com/a/32788650/12365294 https://stackoverflow.com/a/32788650/12365294
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.