What is the difference between Apache Spark compute and slice?

Question

I trying to make unit test on DStream.

I put data in my stream with a mutable queue ssc.queueStream(red)

I set the ManualClock to 0 Start my streaming context. Advance my ManualClock to batchDuration milis

When i'm doing a

stream.slice(Time(0), Time(clock.getTimeMillis())).map(_.collect().toList)

I got a result.

when I do

for (time <- 0L to stream.slideDuration.milliseconds + 10) {
      println("time "+ time + " " +stream.compute(Time(time)).map(_.collect().toList))      
    }

None of them contain a result event the stream.compute(Time(clock.getTimeMillis()))

So what is the difference between this two functions without considerings the parameters differences?

Answer 1

Compute will return an RDD only if the provided time is a correct time in a sliding window ie it's the zero time + a multiple of the slide duration.

Slice will align both the from and to times to slide duration and compute for each of them.

Answer 2

In slide you provide time interval and from time interval as long as it is valid we generate Seq[Time]

def to(that: Time, interval: Duration): Seq[Time] = {
(this.milliseconds) to (that.milliseconds) by (interval.milliseconds) map (new Time(_))

}

and then we "compute" for each instance of Seq[Time]

 alignedFromTime.to(alignedToTime, slideDuration).flatMap { time =>
  if (time >= zeroTime) getOrCompute(time) else None
}

as oppose to compute we only compute for the instance of Time that we pass the compute method...

What is the difference between Apache Spark compute and slice?

Question

2 answers

solution1
1 2015-10-19 21:38:32

solution2
0 2017-04-04 22:45:11

What is the difference between Apache Spark compute and slice?

Question

2 answers

solution1 1 2015-10-19 21:38:32

solution2 0 2017-04-04 22:45:11

solution1
1 2015-10-19 21:38:32

solution2
0 2017-04-04 22:45:11