Spark 结构化流媒体作业卡住了几个小时而没有被杀死

Question

I have a structured streaming job which reads from kafka, perform aggregations and write to hdfs.我有一个结构化的流作业，它从 kafka 读取，执行聚合并写入 hdfs。 The job is running in cluster mode in yarn.该作业在纱线中以集群模式运行。 I am using spark2.4.我正在使用 spark2.4。 Every 2-3 days this job gets stuck.每 2-3 天，这项工作就会卡住。 It doesn't fail but gets stuck at some microbatch microbatch.它不会失败，但会卡在一些 microbatch microbatch 上。 The microbatch doesn't even tend to start.微批次甚至不会开始。 The driver keeps printing following log multiple times for hours.驱动程序会在数小时内多次打印以下日志。

 Got an error when resolving hostNames. Falling back to /default-rack for all.

When I kill the streaming job and start again, the job again starts running fine.当我终止流作业并重新开始时，该作业再次开始正常运行。 How to fix this ?如何解决这个问题？

Answer 1

See this issue https://issues.apache.org/jira/browse/SPARK-28005 This is fixed in spark 3.0.请参阅此问题https://issues.apache.org/jira/browse/SPARK-28005这在 spark 3.0 中已修复。 It seems that this happens because there are no active executers.发生这种情况似乎是因为没有活跃的执行者。

Spark 结构化流媒体作业卡住了几个小时而没有被杀死

问题描述

1 个解决方案

解决方案1
0 2020-10-03 10:26:56

Spark 结构化流媒体作业卡住了几个小时而没有被杀死

问题描述

1 个解决方案

解决方案1 0 2020-10-03 10:26:56

解决方案1
0 2020-10-03 10:26:56