简体   繁体   English

如何对kafka火花流进行基准测试?

[英]how to benchmark the kafka spark-streaming?

I have to perform the benchmarking of spark streaming processing. 我必须执行火花流处理的基准测试。 My process gets pulls messages from the kafka, process and loads into ElasticSearch. 我的流程从kafka获取拉动消息,流程并加载到ElasticSearch中。 The upstream generates 100k records per second. 上游每秒产生10万条记录。 So I would like to calculate how many messages processed in 1 second and the latency time. 因此,我想计算在1秒内处理了多少条消息以及等待时间。 Is there any tools available to monitor this or is there any process to calculate this. 有没有可用的工具来监视此情况,或者有任何过程可以计算此情况。

Spark UI can help you,providing the necessary details you need. Spark UI可以帮助您,提供所需的必要详细信息。 By default, the spark ui is available on http://:4040 in a web browser(For a single spark Context). 默认情况下,可在Web浏览器中的http://:4040上使用spark ui(对于单个spark上下文)。 For the help,you can use this link: http://spark.apache.org/docs/latest/monitoring.html 要获得帮助,您可以使用以下链接: http : //spark.apache.org/docs/latest/monitoring.html

除了Spark UI(可用于确定数据的处理速度)之外,您还可以使用第三方工具(例如spark-perf)对群集执行负载测试并以这种方式获取基准数据。

Maybe someone should try Yahoo's streaming-benchmarks, I found databricks use that tool to do benchmark between spark streaming and flink. 也许有人应该尝试使用Yahoo的流媒体基准,我发现数据块使用该工具在Spark流媒体和flink之间进行基准测试。

https://github.com/yahoo/streaming-benchmarks https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html https://github.com/yahoo/streaming-benchmarks https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art -streaming-systems.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM