[英]struggling to handle deduplication after aggregation in spark streaming
1.streaming data is coming from kafka 2.consuming through spark streaming 3.firstname,lastname,userid and membername ( using member names i am getting the member count eg mark,tyson,2,chris,lisa,iwanka - so here member count is 3 1.streaming 数据来自 kafka 2.通过火花流消费 3.firstname,lastname,userid 和 membername(使用成员名称我得到成员数,例如 mark,tyson,2,chris,lisa,iwanka - 所以这里的成员数是 3
somehow i have to do the count its the requirmnt .不知何故,我必须计算它的要求。 but how can i remove deduplication after aggregation .但是如何在聚合后删除重复数据删除。 its my concern这是我关心的
val df2=df.select(firstname,lastname,membercount,userid)
df2.writestream.format("console").start().awaitTermination
or
df3.select("*").where("membercount >= 3").dropDuplication("userid")
// this one is not working , but i need to do the same after
count only so that in batches same user id will not come again.
only first time entry i want.
Batch-1 output批次 1 输出
firstname lastname member-count userid
john smith 5 1
mark boucher 8 2
shawn pollock 3 3
batch-2 output批次 2 输出
firstname lastname member-count userid
john smith 7 (prev.count 5) 1
shawn pollock 12 (prev.count 8) 3
chris jordan 6 4
// but here i want batch -2 ---------output // 但在这里我想要批处理 -2 ---------输出
1.The possibilty is the john smith ,shawn pollock count will increase again in next batches ,but i dont want to show or keep in output for next batches. 1.可能是约翰史密斯,肖恩波洛克计数将在下一批再次增加,但我不想显示或保留下一批的产量。
ie based on userid , i want entry for the one time only in batch output and neglect again the same user in batch output firstname lastname member-count userid chris jordan 6 4即基于 userid ,我只想在批处理输出中输入一次,并在批处理输出中再次忽略同一用户 firstname lastname member-count userid chris jordan 6 4
Your question is hard to read, but as I understand you want a while loop with a condition?您的问题很难阅读,但据我所知,您想要一个带条件的 while 循环?
var a = 10;
while(a < 20){
println( "Value of a: " + a );
a = a + 1;
}
For example will print例如将打印
value of a: 10
value of a: 11
value of a: 12
value of a: 13
value of a: 14
value of a: 15
value of a: 16
value of a: 17
value of a: 18
value of a: 19
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.