简体   繁体   English

如何将rdd行中的嵌套对象转换为某些自定义对象

[英]How to convert nested object from rdd row to some custom object

I'm trying to learn some scala/spark and trying to practice using some basic spark integration example. 我正在尝试学习一些scala / spark,并尝试使用一些基本的火花集成示例进行练习。 So my problem is that I have a Mongo db running locally. 所以我的问题是我有一个在本地运行的Mongo数据库。 I'm pulling some data and making an rdd from it. 我正在提取一些数据并从中获取数据。 The data in db has a structure like that: db中的数据具有如下结构:

{
    "_id": 0,
    "name": "aimee Zank",
    "scores": [
        {
            "score": 1.463179736705023,
            "type": "exam"
        },
        {
            "score": 11.78273309957772,
            "type": "quiz"
        },
        {
            "score": 35.8740349954354,
            "type": "homework"
        }
    ]
}

Here is some code: 这是一些代码:

val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("simple-app")
    val sparkSession = SparkSession.builder()
      .appName("example-spark-scala-read-and-write-from-mongo")
      .config(conf)
      .config("spark.mongodb.output.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
      .config("spark.mongodb.input.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
      .getOrCreate()

    // Reading Mongodb collection into a dataframe
    val df = MongoSpark.load(sparkSession)
    val dataRdd: RDD[Row] = df.rdd

    dataRdd.foreach(row => println(row.getValuesMap[Any](row.schema.fieldNames)))

The code above provides me this: 上面的代码为我提供了这一点:

Map(_id -> 0, name -> aimee Zank, scores -> WrappedArray([1.463179736705023,exam], [11.78273309957772,quiz], [35.8740349954354,homework]))
Map(_id -> 1, name -> Aurelia Menendez, scores -> WrappedArray([60.06045071030959,exam], [52.79790691903873,quiz], [71.76133439165544,homework]))

At the end I have a problem converting this data to: 最后,我在将这些数据转换为以下内容时遇到问题:

case class Student(id: Long, name: String, scores: Scores)

case class Scores(@JsonProperty("scores") scores: List[Score])

case class Score (
                 @JsonProperty("score") score: Double,
                 @JsonProperty("type") scoreType: String
)

To conclude - the problem is that I cannot convert some data from RDD to the Student object. 总而言之-问题是我无法将一些数据从RDD转换为Student对象。 The most problematic place for me is that 'scores' nested object. 对我来说最有问题的地方是“分数”嵌套的对象。 Please help me to understand how this should be done. 请帮助我了解该如何完成。

Played a bit more with it and ended up with the following solution: 玩了一点,最终得到以下解决方案:

object MainClass {

  def main(args: Array[String]): Unit = {

    val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("simple-app")
    val sparkSession = SparkSession.builder()
      .appName("example-spark-scala-read-and-write-from-mongo")
      .config(conf)
      .config("spark.mongodb.output.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
      .config("spark.mongodb.input.uri", "mongodb://sproot:12345@172.18.0.3:27017/spdb.students")
      .getOrCreate()

    val objectMapper = new ObjectMapper()
    objectMapper.registerModule(DefaultScalaModule)

    // Reading Mongodb collection into a dataframe
    val df = MongoSpark.load(sparkSession)
    val dataRdd: RDD[Row] = df.rdd

    val students: List[Student] =
      dataRdd
        .collect()
        .map(row => Student(row.getInt(0), row.getString(1), createScoresObject(row))).toList
    println()
  }

  def createScoresObject(row: Row): Scores = {
    Scores(getAllScoresFromWrappedArray(row).map(x => Score(x.getDouble(0), x.getString(1))).toList)
  }

  def getAllScoresFromWrappedArray(row: Row): mutable.WrappedArray[GenericRowWithSchema] = {
    getScoresWrappedArray(row).map(x => x.asInstanceOf[GenericRowWithSchema])
  }

  def getScoresWrappedArray(row: Row): mutable.WrappedArray[AnyVal] = {
    row.getAs[mutable.WrappedArray[AnyVal]](2)
  }
}

case class Student(id: Long, name: String, scores: Scores)

case class Scores(scores: List[Score])

case class Score (score: Double, scoreType: String)

But I would be glad to know if there is some elegant solution. 但我很高兴知道是否有一些优雅的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM