繁体   English   中英

无法解决不可序列化的任务 [org.apache.spark.SparkException:任务不可序列化] Spark Scala RDD

[英]Cannot resolve task not serializable [org.apache.spark.SparkException: Task not serializable] Spark Scala RDD

当我尝试创建 class 的 object 并调用特定方法newRDDblah时,我不断收到以下错误堆栈跟踪

I create a spark shell by importing the jar and run the following in spark-shell

spark-shell --master=yarn --jars=sample_jar.jar --files database.cfg

scala> val reader = new Sample(spark)
scala> val a = reader.buildFileRDD("/xyz/path")

org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2294)
  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
  at Sample.newRDDscala(Sample.scala:117)
  ... 48 elided
Caused by: java.io.NotSerializableException:

如何解决此错误?

从堆栈跟踪看来,您在闭包内使用DatabaseUtils的 object ,因为DatabaseUtils不可序列化,因此无法通过 n/w 传输,请尝试序列化DatabaseUtils 此外,您可以制作DatabaseUtils scala object

.. DatabaseUtils extends Serializable

更改DatabaseUtils代码如下 & 在 class 示例中删除变量 dbConfig 和 url,添加此val dbObj = new DatabaseUtils(ConfigFactory.parseFile(new File(config)))

class DatabaseUtils(url: String, username: String, password: String) {

  val driver = "com.mysql.jdbc.Driver"


  def executeSelectQuery(qry: String): List[String] = {

    var dbString : ArrayBuffer[String] = ArrayBuffer.empty[String]
    var conn:Connection = null
    try {
      Class.forName(driver)
      conn = DriverManager.getConnection(url, username, password)
      val statement = conn.createStatement
      val rs = statement.executeQuery(qry)

      while (rs.next) dbString += rs.getString("db_string")

    } catch {
      case e: Exception => e.printStackTrace
    }
    finally {
      conn.close()
    }
    dbString.toList
  }
}

object DatabaseUtils {
 def apply(dbConfig:Config): DatabaseUtils = {
    val url =  "jdbc:mysql://" + dbConfig.getString("db.host") +":"+ dbConfig.getString("db.port") + "/" + dbConfig.getString("db.database") + "?useSSL=false"
    new DatabaseUtils(url, dbConfig.getString("db.username") ,dbConfig.getString("db.password"))
  }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM