简体   繁体   English

如何使用Spark-SQL将行转置为列?

[英]how to transpose row to column using Spark-SQL?

I have following data in my table t1 我的表t1中有以下数据

col1    | col2   |
sess-1  | read   |
sess-1  | meet   |
sess-1  | walk   |
sess-2  | watch  |
sess-2  | sleep  |
sess-2  | run    |
sess-2  | drive  |

expected output: 预期输出:

col1   | col2                  |
sess-1 | read,meet,walk        |
sess-2 | watch,sleep,run,drive |

I am using Spark 1.4.0 我正在使用Spark 1.4.0

Check spark 检查火花

aggregateByKey AggregateByKey

   scala> val babyNamesCSV = sc.parallelize(List(("David", 6), ("Abby", 4), ("David", 5), ("Abby", 5)))
babyNamesCSV: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:12


scala> babyNamesCSV.aggregateByKey(0)((k,v) => v.toInt+k, (v,k) => k+v).collect
res1: Array[(String, Int)] = Array((Abby,9), (David,11))

Above example can help to understand 上面的例子可以帮助理解

or Aggregation https://spark.apache.org/docs/0.6.0/api/core/spark/Aggregator.html 或汇总https://spark.apache.org/docs/0.6.0/api/core/spark/Aggregator.html

// create RDD data
scala> val data = sc.parallelize(List(("sess-1","read"), ("sess-1","meet"), 
    ("sess-1","walk"), ("sess-2","watch"),("sess-2","sleep"), 
    ("sess-2","run"),("sess-2","drive")))

//groupByKey will return Iterable[String] CompactBuffer**
scala> val dataCB = data.groupByKey()`

//map CompactBuffer to List
scala> val tx = dataCB.map{case (col1,col2)  => (col1,col2.toList)}.collect

data: org.apache.spark.rdd.RDD[(String, String)] =
ParallelCollectionRDD[211] at parallelize at <console>:26

dataCB: org.apache.spark.rdd.RDD[(String, Iterable[String])] =
ShuffledRDD[212] at groupByKey at <console>:30

tx: Array[(String, List[String])] = Array((sess-1,List(read, meet,
walk)), (sess-2,List(watch, sleep, run, drive)))

//groupByKey and map to List can also achieved in one statment
scala> val dataCB = data.groupByKey().map{case (col1,col2)  
    => (col1,col2.toList)}.collect

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM