[英]Sort by value in map type column for each row in spark dataframe
Input DataSet输入数据集
ab c ab c
1 2 B:2-C:3 1 2 B:2-C:3
1 1 C:1-D:2 1 1 C:1-D:2
2 2 F:1 2 2 楼:1
Expected Output DataSet预期 Output 数据集
ab c ab c
1 2 C:3-B:2 1 2 C:3-B:2
1 1 D:2-C:1 1 1 D:2-C:1
2 2 F:1 2 2 楼:1
So,所以,
Can anyone help me on this?谁可以帮我这个事? With basic knowledge of spark and scala, I am unable to solve this for now.
有了 spark 和 scala 的基本知识,我现在无法解决这个问题。
Below should do the job:下面应该做的工作:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col
object VerySimpleApp extends App {
val spark = SparkSession.builder.appName("test-app").master("local[1]").getOrCreate()
case class Record(a: Int, b: Int, c: String)
val rows = Seq(
Record(1, 1, "C:1-D:2"),
Record(1, 2, "B:2-C:3"),
Record(2, 2, "F:1")
)
val df = spark.createDataFrame(rows)
df.selectExpr("a", "b", "c", "split(c, '[:-]') cs").
selectExpr("*", "element_at(cs, 2) col1", "element_at(cs, 4) col2").
orderBy(col("col1").desc, col("col2").desc).
show()
}
+---+---+-------+------------+----+----+
| a| b| c| cs|col1|col2|
+---+---+-------+------------+----+----+
| 1| 2|B:2-C:3|[B, 2, C, 3]| 2| 3|
| 1| 1|C:1-D:2|[C, 1, D, 2]| 1| 2|
| 2| 2| F:1| [F, 1]| 1|null|
+---+---+-------+------------+----+----+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.