将数据转换为Spark Scala中的类对象列表

Question

I am trying to write a spark transformation code to convert the below data into list of objects of following class, I am totally new to scala and spark and tried splitting the data and put them into case class but I was unable to append them back. 我正在尝试编写一个spark转换代码，以将以下数据转换为以下类的对象列表，我对scala和spark完全陌生，并尝试将数据拆分并放入case类中，但无法将它们附加回去。 Request your help on this. 请求您的帮助。

Data : 资料：

FirstName,LastName,Country,match,Goals
Cristiano,Ronaldo,Portugal,Match1,1
Cristiano,Ronaldo,Portugal,Match2,1
Cristiano,Ronaldo,Portugal,Match3,0
Cristiano,Ronaldo,Portugal,Match4,2
Lionel,Messi,Argentina,Match1,1
Lionel,Messi,Argentina,Match2,2
Lionel,Messi,Argentina,Match3,1
Lionel,Messi,Argentina,Match4,2

Desired output: 所需的输出：

PLayerStats{ String FirstName,
    String LastName,
    String Country,
    Map <String,Int> matchandscore
}

Answer 1

Assuming you already loaded data into an RDD[String] named data : 假设您已经将数据加载到名为data的RDD[String] ：

case class PlayerStats(FirstName: String, LastName: String, Country: String, matchandscore: Map[String, Int])

val result: RDD[PlayerStats] = data
  .filter(!_.startsWith("FirstName")) // remove header
  .map(_.split(",")).map { // map into case classes
    case Array(fn, ln, cntry, mn, g) => PlayerStats(fn, ln, cntry, Map(mn -> g.toInt))
  }
  .keyBy(p => (p.FirstName, p.LastName)) // key by player
  .reduceByKey((p1, p2) => p1.copy(matchandscore = p1.matchandscore ++ p2.matchandscore)) 
  .map(_._2) // remove key

Answer 2

Firstly convert the line into key value pair say (Cristiano, rest of data) then apply groupByKey or reduceByKey can also work then try to convert the key value pair data after applying groupByKey or reduceByKey into your class by putting value. 首先将行转换为键值对，例如(Cristiano, rest of data)然后应用groupByKey或reduceByKey也可以工作，然后在通过将value应用于groupByKey或reduceByKey到您的类后尝试将键值对数据转换。 Take a help of famous word count program. 借助著名的字数统计程序。

http://spark.apache.org/examples.html http://spark.apache.org/examples.html

Answer 3

You could try something as follows: 您可以尝试以下操作：

val file = sc.textFile("myfile.csv")

val df = file.map(line => line.split(",")).       // split line by comma
              filter(lineSplit => lineSplit(0) != "FirstName").  // filter out first row
              map(lineSplit => {            // transform lines
              (lineSplit(0), lineSplit(1), lineSplit(2), Map((lineSplit(3), lineSplit(4).toInt)))}).
              toDF("FirstName", "LastName", "Country", "MatchAndScore")         

df.schema
// res34: org.apache.spark.sql.types.StructType = StructType(StructField(FirstName,StringType,true), StructField(LastName,StringType,true), StructField(Country,StringType,true), StructField(MatchAndScore,MapType(StringType,IntegerType,false),true))

df.show

+---------+--------+---------+----------------+
|FirstName|LastName|  Country|   MatchAndScore|
+---------+--------+---------+----------------+
|Cristiano| Ronaldo| Portugal|Map(Match1 -> 1)|
|Cristiano| Ronaldo| Portugal|Map(Match2 -> 1)|
|Cristiano| Ronaldo| Portugal|Map(Match3 -> 0)|
|Cristiano| Ronaldo| Portugal|Map(Match4 -> 2)|
|   Lionel|   Messi|Argentina|Map(Match1 -> 1)|
|   Lionel|   Messi|Argentina|Map(Match2 -> 2)|
|   Lionel|   Messi|Argentina|Map(Match3 -> 1)|
|   Lionel|   Messi|Argentina|Map(Match4 -> 2)|
+---------+--------+---------+----------------+

将数据转换为Spark Scala中的类对象列表

问题描述

3 个解决方案

解决方案1
1 2016-12-24 09:24:01

解决方案2
0 2016-12-24 03:48:18

解决方案3
0 2016-12-24 04:10:33

将数据转换为Spark Scala中的类对象列表

问题描述

3 个解决方案

解决方案1 1 2016-12-24 09:24:01

解决方案2 0 2016-12-24 03:48:18

解决方案3 0 2016-12-24 04:10:33

解决方案1
1 2016-12-24 09:24:01

解决方案2
0 2016-12-24 03:48:18

解决方案3
0 2016-12-24 04:10:33