[英]Transformation of data into list of objects of class in spark scala
I am trying to write a spark transformation code to convert the below data into list of objects of following class, I am totally new to scala and spark and tried splitting the data and put them into case class but I was unable to append them back. 我正在尝试编写一个spark转换代码,以将以下数据转换为以下类的对象列表,我对scala和spark完全陌生,并尝试将数据拆分并放入case类中,但无法将它们附加回去。 Request your help on this.
请求您的帮助。
Data : 资料:
FirstName,LastName,Country,match,Goals
Cristiano,Ronaldo,Portugal,Match1,1
Cristiano,Ronaldo,Portugal,Match2,1
Cristiano,Ronaldo,Portugal,Match3,0
Cristiano,Ronaldo,Portugal,Match4,2
Lionel,Messi,Argentina,Match1,1
Lionel,Messi,Argentina,Match2,2
Lionel,Messi,Argentina,Match3,1
Lionel,Messi,Argentina,Match4,2
Desired output: 所需的输出:
PLayerStats{ String FirstName,
String LastName,
String Country,
Map <String,Int> matchandscore
}
Assuming you already loaded data into an RDD[String]
named data
: 假设您已经将数据加载到名为
data
的RDD[String]
:
case class PlayerStats(FirstName: String, LastName: String, Country: String, matchandscore: Map[String, Int])
val result: RDD[PlayerStats] = data
.filter(!_.startsWith("FirstName")) // remove header
.map(_.split(",")).map { // map into case classes
case Array(fn, ln, cntry, mn, g) => PlayerStats(fn, ln, cntry, Map(mn -> g.toInt))
}
.keyBy(p => (p.FirstName, p.LastName)) // key by player
.reduceByKey((p1, p2) => p1.copy(matchandscore = p1.matchandscore ++ p2.matchandscore))
.map(_._2) // remove key
Firstly convert the line into key value pair say (Cristiano, rest of data)
then apply groupByKey
or reduceByKey
can also work then try to convert the key value pair data after applying groupByKey or reduceByKey into your class by putting value. 首先将行转换为键值对,例如
(Cristiano, rest of data)
然后应用groupByKey
或reduceByKey
也可以工作,然后在通过将value应用于groupByKey或reduceByKey到您的类后尝试将键值对数据转换。 Take a help of famous word count program. 借助著名的字数统计程序。
http://spark.apache.org/examples.html http://spark.apache.org/examples.html
You could try something as follows: 您可以尝试以下操作:
val file = sc.textFile("myfile.csv")
val df = file.map(line => line.split(",")). // split line by comma
filter(lineSplit => lineSplit(0) != "FirstName"). // filter out first row
map(lineSplit => { // transform lines
(lineSplit(0), lineSplit(1), lineSplit(2), Map((lineSplit(3), lineSplit(4).toInt)))}).
toDF("FirstName", "LastName", "Country", "MatchAndScore")
df.schema
// res34: org.apache.spark.sql.types.StructType = StructType(StructField(FirstName,StringType,true), StructField(LastName,StringType,true), StructField(Country,StringType,true), StructField(MatchAndScore,MapType(StringType,IntegerType,false),true))
df.show
+---------+--------+---------+----------------+
|FirstName|LastName| Country| MatchAndScore|
+---------+--------+---------+----------------+
|Cristiano| Ronaldo| Portugal|Map(Match1 -> 1)|
|Cristiano| Ronaldo| Portugal|Map(Match2 -> 1)|
|Cristiano| Ronaldo| Portugal|Map(Match3 -> 0)|
|Cristiano| Ronaldo| Portugal|Map(Match4 -> 2)|
| Lionel| Messi|Argentina|Map(Match1 -> 1)|
| Lionel| Messi|Argentina|Map(Match2 -> 2)|
| Lionel| Messi|Argentina|Map(Match3 -> 1)|
| Lionel| Messi|Argentina|Map(Match4 -> 2)|
+---------+--------+---------+----------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.