简体   繁体   中英

Spark Scala Dynamic creation of Serializable object

I need using a tester for Scala Spark filter, with tester implementing java's Predicate interface and receiving specific class name by arguments. I'm doing something like this

val tester = Class.forName(qualifiedName).newInstance().asInstanceOf[Predicate[T]]
var filtered = rdd.filter(elem => tester.test(elem))

The problem is that at runtime i have a Spark "TaskNotSerializable Exception" because my specific Predicate class is not Serializable.

If I do

val tester = Class.forName(qualifiedName).newInstance()
             .asInstanceOf[Predicate[T] with Serializable]
var filtered = rdd.filter(elem => tester.test(elem))

I get the same error. If I create tester into rdd.filter call it works:

var filtered = rdd.filter { elem => 
    val tester = Class.forName(qualifiedName).newInstance()
             .asInstanceOf[Predicate[T] with Serializable]
    tester.test(elem)
}

But I would create a single object (maybe to broadcast) for testing. How can I resolve?

You simply have to require the class implements Serializable . Note that the asInstanceOf[Predicate[T] with Serializable] cast is a lie: it doesn't actually check value is Serializable , which is why the second case doesn't produce an error immediately during the cast, and the last one "succeeds".

But I would create a single object (maybe to broadcast) for testing.

You can't. Broadcast or not, deserialization will create new objects on worker nodes. But you can create only a single instance on each partition:

var filtered = rdd.mapPartitions { iter => 
    val tester = Class.forName(qualifiedName).newInstance()
             .asInstanceOf[Predicate[T]]
    iter.filter(tester.test)
}

It will actually perform better than serializing the tester , sending it, and deserializing it would, since it's strictly less work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM