简体   繁体   English

按火花 dataframe 中每一行的 map 类型列中的值排序

[英]Sort by value in map type column for each row in spark dataframe

Input DataSet输入数据集

ab c ab c

1 2 B:2-C:3 1 2 B:2-C:3

1 1 C:1-D:2 1 1 C:1-D:2

2 2 F:1 2 2 楼:1

Expected Output DataSet预期 Output 数据集

ab c ab c

1 2 C:3-B:2 1 2 C:3-B:2

1 1 D:2-C:1 1 1 D:2-C:1

2 2 F:1 2 2 楼:1

So,所以,

  1. c column needs to be sorted in desc order of value after: c 列需要按值的降序排序后:
  2. Each key value pair in c is separated by - c 中的每个键值对由 - 分隔
  3. There also might be only 1 key value in pair in c column. c 列中也可能只有 1 个键值对。

Can anyone help me on this?谁可以帮我这个事? With basic knowledge of spark and scala, I am unable to solve this for now.有了 spark 和 scala 的基本知识,我现在无法解决这个问题。

Below should do the job:下面应该做的工作:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.col

object VerySimpleApp extends App {

  val spark = SparkSession.builder.appName("test-app").master("local[1]").getOrCreate()
  case class Record(a: Int, b: Int, c: String)
  val rows = Seq(
    Record(1, 1, "C:1-D:2"),
    Record(1, 2, "B:2-C:3"),
    Record(2, 2, "F:1")
  )
  val df = spark.createDataFrame(rows)
  df.selectExpr("a", "b", "c", "split(c, '[:-]') cs").
    selectExpr("*", "element_at(cs, 2) col1", "element_at(cs, 4) col2").
    orderBy(col("col1").desc, col("col2").desc).
    show()

}

+---+---+-------+------------+----+----+
|  a|  b|      c|          cs|col1|col2|
+---+---+-------+------------+----+----+
|  1|  2|B:2-C:3|[B, 2, C, 3]|   2|   3|
|  1|  1|C:1-D:2|[C, 1, D, 2]|   1|   2|
|  2|  2|    F:1|      [F, 1]|   1|null|
+---+---+-------+------------+----+----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM