[英]Generate a single row dataframe for lookup
This is a follow up question that I posted previously.这是我之前发布的后续问题。
Step 1:第1步:
scala> spark.sql("select map('s1', 'p1', 's2', 'p2', 's3', 'p3') as lookup").show()
+--------------------+
| lookup|
+--------------------+
|[s1 -> p1, s2 -> ...|
+--------------------+
Step 2:第2步:
scala> val df = Seq(("s1", "p1"), ("s2", "p2"), ("s3", "p3")).toDF("s", "p")
df: org.apache.spark.sql.DataFrame = [s: string, p: string]
scala> df.show()
+---+---+
| s| p|
+---+---+
| s1| p1|
| s2| p2|
| s3| p3|
+---+---+
Step 3:第 3 步:
scala> val df1 = df.selectExpr("map(s,p) lookup")
df1: org.apache.spark.sql.DataFrame = [cc: map<string,string>]
scala> df1.show()
+----------+
| lookup|
+----------+
|[s1 -> p1]|
|[s2 -> p2]|
|[s3 -> p3]|
+----------+
My expected result in step3 is the result I am getting in step1.我在 step3 中的预期结果是我在 step1 中得到的结果。 How can I achieve it?我怎样才能实现它?
The two columns for the key and value should be aggregated into arrays before merging them into a map .键和值的两列应在将它们合并到map之前聚合到数组中。
import org.apache.spark.sql.functions._
df.agg(collect_list("s").as("s"), collect_list("p").as("p"))
.select(map_from_arrays('s,'p).as("lookup"))
.show(false)
Output:输出:
+------------------------------+
|lookup |
+------------------------------+
|[s1 -> p1, s2 -> p2, s3 -> p3]|
+------------------------------+
Without the collect_list
calls, each row will be transformed individually into a map.如果没有collect_list
调用,每一行都将单独转换为一个映射。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.