I am using spark 1.6 and came across this function reduceByKeyAndWindow which I am using to perform word count over data transmitted over a kafka topic.
Following is the list of alternatives reduceByKeyAndWindow is providing. As we can see, all the alternatives has similar signatures with extra parameters.
But when I just use reduceByKeyAndWindow with my reduce function or with my reduce function and duration, it works and doesn't give me any errors as shown below.
But when I use the alternative with reduce function, duration and sliding window time it starts giving me the following error, same happens with the other alternatives, as shown below.
I am not really sure what is happening here and how can I fix the problem.
Any help is appreciated
If you comment this line .words.map(x => (x, 1L))
you should be able to use the method [ .reduceByWindow(_+_, Seconds(2), Seconds(2))
] from DStream
.
If you transform the words to words with count, then you should use the below method.
reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
Please see the documentation on more details for what are those reduce function
and inverse reduce
function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.