简体   繁体   中英

How can I remove an RDD from a DStream in Spark Streaming?

I would like to drop the first n RDDs from a DStream. I tried using the following function along with transform, but it doesn't work (ERROR OneForOneStrategy: org.apache.spark.SparkContext java.io.NotSerializableException), and I don't think it would accomplish my real goal of removing the RDDs because it would return empty ones.

var num = 0
def dropNrdds(myRDD: RDD[(String, Int)], dropNum: Int) : RDD[(String, Int)] = {
    if (num < dropNum) {
        num = num + 1
        return myRDD
    }
    else {
        return sc.makeRDD(Seq())
    }
}

The error is because your function refers to your var num and the containing class is not Serializable . Your function is going to be called by different nodes of the cluster, so anything it depends on has to be Serializable , and you can't share a variable between different invocations of your function (because they might be running on different cluster nodes).

It seems very odd to want to drop a specific number of RDD s from a DStream , given that the way a particular DStream is split up is pretty much an implementation detail. Perhaps the time-based slice method can be made to do what you want?

You are getting error because, i am guessing you are calling this function from

foreachRdd

loop, which actually gets executed on executers nodes and if you want something to get executed on executor nodes that pice of code must be Serializable and SparkContext(sc, you are referring it inside your dropNrdds method) is not Serializable, hence you are getting that error.

and coming to your actual question.

not sure about your requirement but

you can create a DataFrame for your RDD and select records which matches your criteria. and ignore the rest.

or

you can use filter and create a fresh RDD with filters data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM