简体   繁体   English

如何访问 Spark Streaming 应用程序的统计端点?

[英]How to access statistics endpoint for a Spark Streaming application?

As of Spark 2.2.0, there's are new endpoints in the API for getting information about streaming jobs.从 Spark 2.2.0 开始,API 中有新的端点用于获取有关流式作业的信息。

I run Spark on EMR clusters, using Spark 2.2.0 in cluster mode.我在 EMR 集群上运行 Spark,在集群模式下使用 Spark 2.2.0。

When I hit the endpoint for my streaming jobs, all it gives me is the error message:当我到达我的流媒体作业的端点时,它给我的只是错误消息:

no streaming listener attached to <stream name>没有附加到 <stream name> 的流监听器

I've dug through the Spark codebase a bit, but this feature is not very well documented.我已经深入研究了 Spark 代码库,但是这个特性没有很好的文档记录。 So I'm curious if this is a bug?所以我很好奇这是否是一个错误? Is there some configuration I need to do to get this endpoint working?我需要做一些配置才能使该端点正常工作吗?


This appears to be an issue specifically when running on the cluster.这似乎是一个问题,特别是在集群上运行时。 The same code running on Spark 2.2.0 on my local machine shows the statistics as expected, but gives that error message when run on the cluster.在我本地机器上的 Spark 2.2.0 上运行的相同代码显示了预期的统计信息,但在集群上运行时给出了该错误消息。

I'm using the very latest Spark 2.3.0-SNAPSHOT built today from the master so YMMV .我正在使用今天从 master 所以YMMV构建的最新Spark 2.3.0-SNAPSHOT It worked fine.它工作得很好。

Is there some configuration I need to do to get this endpoint working?我需要做一些配置才能让这个端点工作吗?

No. It's supposed to work fine with no changes to the default configuration.不,它应该可以在不更改默认配置的情况下正常工作。

Make sure the you use the host and port of the driver (as rumors are that you could also access 18080 of Spark History Server that does show all the same endpoints, and the same jobs running, but no streaming listener attached).确保您使用驱动程序的主机和端口(据传您还可以访问 Spark History Server 的18080 ,它确实显示了所有相同的端点和相同的作业正在运行,但没有附加流式侦听器)。


As you can see in the source code where the error message lives it can happen only when ui.getStreamingJobProgressListener has not been registered (that ends up in case None ).正如您在错误消息所在的源代码中看到的那样,它只会在ui.getStreamingJobProgressListener尚未注册时发生(以case None结束)。

So the question now is why would that SparkListener not be registered?所以现在的问题是为什么SparkListener不会被注册?

That leads us to the streamingJobProgressListener var that is set using setStreamingJobProgressListener method exclusively while StreamingTab is being instantiated (which was the reason why I asked you if you can see the Streaming tab).这导致我们在StreamingTab 被实例化专门使用setStreamingJobProgressListener方法设置的streamingJobProgressListener var(这就是我问你是否可以看到 Streaming 选项卡的原因)。

In other words, if you see the Streaming tab in web UI, you have the streaming metric endpoint(s) available.换句话说,如果您在 Web UI 中看到 Streaming 选项卡,则表示您拥有可用的流媒体指标端点。 Check the URL to the endpoint which should be in the format:检查端点的 URL,其格式应为:

http://[driverHost]:[port]/api/v1/applications/[appId]/streaming/statistics

I tried to reproduce your case and did the following that led me to a working case.我试图重现您的案例并执行以下操作,这使我找到了一个可行的案例。

  1. Started one of the official examples of Spark Streaming applications.启动了 Spark Streaming 应用程序的官方示例之一。

     $ ./bin/run-example streaming.StatefulNetworkWordCount localhost 9999

    I did run nc -lk 9999 first.我确实先运行了nc -lk 9999

  2. Opened the web UI @ http://localhost:4040/streaming to make sure the Streaming tab is there.打开 Web UI @ http://localhost:4040/streaming以确保Streaming选项卡在那里。

    流媒体标签@Web UI

  3. Made sure http://localhost:4040/api/v1/applications/ responds with application ids.确保http://localhost:4040/api/v1/applications/响应应用程序 ID。

     $ http http://localhost:4040/api/v1/applications/ HTTP/1.1 200 OK Content-Encoding: gzip Content-Length: 266 Content-Type: application/json Date: Wed, 13 Dec 2017 07:58:04 GMT Server: Jetty(9.3.z-SNAPSHOT) Vary: Accept-Encoding, User-Agent [ { "attempts": [ { "appSparkVersion": "2.3.0-SNAPSHOT", "completed": false, "duration": 0, "endTime": "1969-12-31T23:59:59.999GMT", "endTimeEpoch": -1, "lastUpdated": "2017-12-13T07:53:53.751GMT", "lastUpdatedEpoch": 1513151633751, "sparkUser": "jacek", "startTime": "2017-12-13T07:53:53.751GMT", "startTimeEpoch": 1513151633751 } ], "id": "local-1513151634282", "name": "StatefulNetworkWordCount" } ]
  4. Accessed the endpoint for the Spark Streaming application @ http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics .访问了 Spark Streaming 应用程序的端点 @ http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics

     $ http http://localhost:4040/api/v1/applications/local-1513151634282/streaming/statistics HTTP/1.1 200 OK Content-Encoding: gzip Content-Length: 219 Content-Type: application/json Date: Wed, 13 Dec 2017 08:00:10 GMT Server: Jetty(9.3.z-SNAPSHOT) Vary: Accept-Encoding, User-Agent { "avgInputRate": 0.0, "avgProcessingTime": 30, "avgSchedulingDelay": 0, "avgTotalDelay": 30, "batchDuration": 1000, "numActiveBatches": 0, "numActiveReceivers": 1, "numInactiveReceivers": 0, "numProcessedRecords": 0, "numReceivedRecords": 0, "numReceivers": 1, "numRetainedCompletedBatches": 376, "numTotalCompletedBatches": 376, "startTime": "2017-12-13T07:53:54.921GMT" }

TL;DR Just go to: http://localhost:4040/streaming TL;DR 只需访问: http://localhost:4040/streaming

Had a same issue.有同样的问题。 I ran spark application from Pycharm Python virtual environment.我从 Pycharm Python 虚拟环境运行 spark 应用程序。 Spark reported that port 4040 was taken: Spark 报告端口 4040 被占用:

Spark context Web UI available at http://192.168.100.221:4042

but I saw no jobs there and Streaming tab missing.但我在那里没有看到任何工作并且缺少流媒体选项卡。 Then I went to http://localhost:4040/streaming and behold, everything was there.然后我去http://localhost:4040/streaming看,一切都在那里。

If you look at the output of PyCharm in the console window it will show what port it used streaming on.如果您在控制台 window 中查看 output 的 output,它将显示它使用的流媒体端口。 I was assuming it was 4040 but when i checked the output carefully the port was on 4041. Here is the output:我假设它是 4040,但当我仔细检查 output 时,端口在 4041 上。这是 output:

WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.警告 Utils:服务“SparkUI”无法绑定到端口 4040。正在尝试端口 4041。

Then you can use localhost:4041 on any web browser and you should see the streaming output. Hope this helps!然后您可以在任何 web 浏览器上使用 localhost:4041,您应该会看到流式传输 output。希望这对您有所帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM