[英]How to convert nested object from rdd row to some custom object
I'm trying to learn some scala/spark and trying to practice using some basic spark integration example. 我正在尝试学习一些scala / spark,并尝试使用一些基本的火花集成示例进行练习。 So my problem is that I have a Mongo db running locally. 所以我的问题是我有一个在本地运行的Mongo数据库。 I'm pulling some data and making an rdd from it. 我正在提取一些数据并从中获取数据。 The data in db has a structure like that: db中的数据具有如下结构:
{
"_id": 0,
"name": "aimee Zank",
"scores": [
{
"score": 1.463179736705023,
"type": "exam"
},
{
"score": 11.78273309957772,
"type": "quiz"
},
{
"score": 35.8740349954354,
"type": "homework"
}
]
}
Here is some code: 这是一些代码:
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("simple-app")
val sparkSession = SparkSession.builder()
.appName("example-spark-scala-read-and-write-from-mongo")
.config(conf)
.config("spark.mongodb.output.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
.config("spark.mongodb.input.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
.getOrCreate()
// Reading Mongodb collection into a dataframe
val df = MongoSpark.load(sparkSession)
val dataRdd: RDD[Row] = df.rdd
dataRdd.foreach(row => println(row.getValuesMap[Any](row.schema.fieldNames)))
The code above provides me this: 上面的代码为我提供了这一点:
Map(_id -> 0, name -> aimee Zank, scores -> WrappedArray([1.463179736705023,exam], [11.78273309957772,quiz], [35.8740349954354,homework]))
Map(_id -> 1, name -> Aurelia Menendez, scores -> WrappedArray([60.06045071030959,exam], [52.79790691903873,quiz], [71.76133439165544,homework]))
At the end I have a problem converting this data to: 最后,我在将这些数据转换为以下内容时遇到问题:
case class Student(id: Long, name: String, scores: Scores)
case class Scores(@JsonProperty("scores") scores: List[Score])
case class Score (
@JsonProperty("score") score: Double,
@JsonProperty("type") scoreType: String
)
To conclude - the problem is that I cannot convert some data from RDD to the Student object. 总而言之-问题是我无法将一些数据从RDD转换为Student对象。 The most problematic place for me is that 'scores' nested object. 对我来说最有问题的地方是“分数”嵌套的对象。 Please help me to understand how this should be done. 请帮助我了解该如何完成。
Played a bit more with it and ended up with the following solution: 玩了一点,最终得到以下解决方案:
object MainClass {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("simple-app")
val sparkSession = SparkSession.builder()
.appName("example-spark-scala-read-and-write-from-mongo")
.config(conf)
.config("spark.mongodb.output.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
.config("spark.mongodb.input.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
.getOrCreate()
val objectMapper = new ObjectMapper()
objectMapper.registerModule(DefaultScalaModule)
// Reading Mongodb collection into a dataframe
val df = MongoSpark.load(sparkSession)
val dataRdd: RDD[Row] = df.rdd
val students: List[Student] =
dataRdd
.collect()
.map(row => Student(row.getInt(0), row.getString(1), createScoresObject(row))).toList
println()
}
def createScoresObject(row: Row): Scores = {
Scores(getAllScoresFromWrappedArray(row).map(x => Score(x.getDouble(0), x.getString(1))).toList)
}
def getAllScoresFromWrappedArray(row: Row): mutable.WrappedArray[GenericRowWithSchema] = {
getScoresWrappedArray(row).map(x => x.asInstanceOf[GenericRowWithSchema])
}
def getScoresWrappedArray(row: Row): mutable.WrappedArray[AnyVal] = {
row.getAs[mutable.WrappedArray[AnyVal]](2)
}
}
case class Student(id: Long, name: String, scores: Scores)
case class Scores(scores: List[Score])
case class Score (score: Double, scoreType: String)
But I would be glad to know if there is some elegant solution. 但我很高兴知道是否有一些优雅的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.