简体   繁体   中英

Weird behavior of reduceByKeyAndWindow function in Spark

I am using spark 1.6 and came across this function reduceByKeyAndWindow which I am using to perform word count over data transmitted over a kafka topic.

Following is the list of alternatives reduceByKeyAndWindow is providing. As we can see, all the alternatives has similar signatures with extra parameters.

功能清单

But when I just use reduceByKeyAndWindow with my reduce function or with my reduce function and duration, it works and doesn't give me any errors as shown below.

这里没有错误或警告

But when I use the alternative with reduce function, duration and sliding window time it starts giving me the following error, same happens with the other alternatives, as shown below.

收到错误消息

I am not really sure what is happening here and how can I fix the problem.

Any help is appreciated

If you comment this line .words.map(x => (x, 1L)) you should be able to use the method [ .reduceByWindow(_+_, Seconds(2), Seconds(2)) ] from DStream .

If you transform the words to words with count, then you should use the below method.

reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)

Please see the documentation on more details for what are those reduce function and inverse reduce function https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM