简体   繁体   English

Scala对象成员如何与Spark RDD一起使用

[英]How scala object member work with spark rdd

I have a spark application, that output result to redis. 我有一个spark应用程序,该输出结果为redis。

It works fine on local mode, but cannot connect the redisHost with the args(0) that I assign like 10.242.10.100 on yarn-cluster mode. 它在本地模式下工作正常,但无法将redisHost与我分配的args(0)像在纱线群集模式下那样连接10.242.10.100

The redisHost is unchanged 127.0.0.1 . redisHost保持不变127.0.0.1

object TestSparkClosure {
  val logger: Logger = LoggerFactory.getLogger(TestSparkClosure.getClass)
  var redisHost = "127.0.0.1"
  var redisPort = 6379

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("TestSparkClosure")

    if (args.length > 0) {
      redisHost = args(0)
    } else {
      conf.setMaster("local")
    }
    val sparkContext = new SparkContext(conf)
    var rdd = getRdd(sparkContext)
    rdd.foreachPartition(partitionOfRecords => {
      logger.info("host:port:" + redisHost + ":" + redisPort.toString)
      val jedis = new Jedis(redisHost, redisPort)
      partitionOfRecords.foreach(pair => {
        val keystr = pair._1
        val valuestr = pair._2
        jedis.set(keystr, valuestr)
      })
    })
  }

  def getRdd(spark: SparkContext): RDD[(String, String)] = {
    val rdd = spark.parallelize(List("2017\t1", "2018\t2", "2017\t3", "2018\t4", "2017\t5", "2018\t6")).map(line => {
      val cols = line.split("\t")
      (cols(0), cols(1))
    })
    rdd.reduceByKey((x, y) => {
      ((x.toInt + y.toInt).toString)
    }, 3)
  }
}

When I replace redisHost with local variable like this, It works fine again. 当我用这样的局部变量替换redisHost时,它再次正常工作。

    var localRedisHost = redisHost
    rdd.foreachPartition(partitionOfRecords => {
      logger.info("host:port:" + localRedisHost + ":" + redisPort.toString)
      val jedis = new Jedis(localRedisHost , redisPort)
      partitionOfRecords.foreach(pair => {
        val keystr = pair._1
        val valuestr = pair._2
        jedis.set(keystr, valuestr)
      })
    })

Can anyone explain how the spark closure work here? 谁能解释火花塞在这里的工作原理?

Thanks so much. 非常感谢。

Its because you are using a variable which isnt able to use serialization. 这是因为您使用的变量无法使用序列化。 when you define a local element it can so you are able to use it inside of the RDD. 当您定义本地元素时,它可以,因此您可以在RDD中使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM