简体   繁体   English

任务不可序列化异常

[英]Task not serializable exception

For some reason I am getting a Task not serializable exception with the following code. 由于某种原因,我收到了以下代码无法执行Task的不可序列化的异常。 I am running this on spark local using sbt test. 我正在使用sbt test在spark local上运行它。

@RunWith(classOf[JUnitRunner])
class NQTest extends FeatureSpec with Matchers with Serializable {
  val conf = new SparkConf().setAppName("NQ Market Makers Test").setMaster("local")
  val sc = new SparkContext(conf)
  ...

  val testData : RDD[(String, String)] = sc.textFile("testcases/NQIntervalsTestData").map { line => (line.split(":", 2)(0), line.split(":", 2)(1)) }
  testData.persist();
  def testDatasets(input : Int) = {
    testData.filter(_ match {
      case (s, _) => (s == "Test Case " + input)
      case _      => false
    }).map(x => x match {
      case (_, line) => line
    })
  }

  ...

  feature("NQIntervals") {
    scenario("Test data sanity check") {
      (testDatasets(1).collect()) should not be null
    }
  }
}

And the exception: 和例外:

org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
        at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
        at test.scala.org.<redacted>.NQTest$.testDatasets(NQTest.scala:31)

Unlike the other stack overflow questions that I've seen here regarding this exception, this seems to be concerning the RDD itself rather than the function I've passed to filter. 与我在这里看到的有关此异常的其他堆栈溢出问题不同,这似乎与RDD本身有关,而不是与我传递给过滤器的函数有关。

For example, we can remove the filter and map entirely and we still end up an exception during the collect. 例如,我们可以删除过滤器并完全映射,但在收集过程中仍然会出现异常。 From my googling I've only been able to find answers to problems involving non serializable objects inside a filter or a map, not problems with the RDD itself. 从我的谷歌搜索中,我只能找到解决涉及过滤器或映射中不可序列化对象的问题的答案,而不是RDD本身的问题。

Things I've tried so far: 到目前为止我尝试过的事情:

  • Removed the filter and map inside the testDatasets method and just returned the testData set. 删除了testDatasets方法内的过滤器和映射,并刚刚返回了testData集。 This caused the exception to happen when collect was called. 这导致在调用collect时发生异常。
  • Removed the unit testing framework entirely, made NQTest extend Serializable directly and wrote a one line main method consisting of testDatasets(1).collect() : still the same exception 完全删除了单元测试框架,使NQTest直接扩展了Serializable,并编写了由testDatasets(1).collect()组成的单行主要方法:仍然是相同的异常
  • Removed testData.persist() : still the same exception 删除了testData.persist() :仍然是相同的异常

Any insight would be welcome! 任何见识都将受到欢迎!

Turns out I was a huge idiot and was stopping the spark context before the actual tests were being run. 原来我是个大白痴,在运行实际测试之前就停止了spark上下文。 Disregard 漠视

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM