简体   繁体   English

如何每5分钟使用spark分析一次pv,uv,ip

[英]how to use spark to analyze pv,uv,ip every 5 mins

How to analyze uv, pv, ip a day every 5 mins, and stored Mysql. 如何每天每5分钟分析一次uv,pv,ip,并存储Mysql。 Data is from Kafka in the following format: 数据来自Kafka,格式如下:

Message sent: {"cookie":"a95f22eabc4fd4b580c011a3161a9d9d","ip":"125.119.144.252","event_time":"2017-08-07 10:50:16"}
Message sent: {"cookie":"6b67c8c700427dee7552f81f3228c927","ip":"202.109.201.181","event_time":"2017-08-07 10:50:26"}

It's just like 00:00-00:05 00:05--00:10 and so on , I used: 就像00:00-00:05 00:05--00:10等等,我用过:

val write=new JDBCSink()
       val query=counts.writeStream.foreach(write).outputMode("complete")
          .trigger(ProcessingTime("5 minutes"))    
          .start()

but when I commit it on 00:01 or it's breakdown, how can I sure it not will analyse like 00:01-00:06 and so. 但是当我在00:01提交它或它崩溃时,如何确定它不会像00:01-00:06这样分析。

Using window function: 使用window功能:

query = counts.groupBy(window('event_time', '5 second')).agg()
query.writeStream.start()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM