简体   繁体   中英

What is the difference between Apache Spark compute and slice?

I trying to make unit test on DStream.

I put data in my stream with a mutable queue ssc.queueStream(red)

I set the ManualClock to 0 Start my streaming context. Advance my ManualClock to batchDuration milis

When i'm doing a

stream.slice(Time(0), Time(clock.getTimeMillis())).map(_.collect().toList)

I got a result.

when I do

for (time <- 0L to stream.slideDuration.milliseconds + 10) {
      println("time "+ time + " " +stream.compute(Time(time)).map(_.collect().toList))      
    }

None of them contain a result event the stream.compute(Time(clock.getTimeMillis()))

So what is the difference between this two functions without considerings the parameters differences?

Compute will return an RDD only if the provided time is a correct time in a sliding window ie it's the zero time + a multiple of the slide duration.

Slice will align both the from and to times to slide duration and compute for each of them.

In slide you provide time interval and from time interval as long as it is valid we generate Seq[Time]

def to(that: Time, interval: Duration): Seq[Time] = {
(this.milliseconds) to (that.milliseconds) by (interval.milliseconds) map (new Time(_))

}

and then we "compute" for each instance of Seq[Time]

 alignedFromTime.to(alignedToTime, slideDuration).flatMap { time =>
  if (time >= zeroTime) getOrCompute(time) else None
}

as oppose to compute we only compute for the instance of Time that we pass the compute method...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM