[英]Spark structured streaming job stuck for hours without getting killed
I have a structured streaming job which reads from kafka, perform aggregations and write to hdfs.我有一个结构化的流作业,它从 kafka 读取,执行聚合并写入 hdfs。 The job is running in cluster mode in yarn.
该作业在纱线中以集群模式运行。 I am using spark2.4.
我正在使用 spark2.4。 Every 2-3 days this job gets stuck.
每 2-3 天,这项工作就会卡住。 It doesn't fail but gets stuck at some microbatch microbatch.
它不会失败,但会卡在一些 microbatch microbatch 上。 The microbatch doesn't even tend to start.
微批次甚至不会开始。 The driver keeps printing following log multiple times for hours.
驱动程序会在数小时内多次打印以下日志。
Got an error when resolving hostNames. Falling back to /default-rack for all.
When I kill the streaming job and start again, the job again starts running fine.当我终止流作业并重新开始时,该作业再次开始正常运行。 How to fix this ?
如何解决这个问题?
See this issue https://issues.apache.org/jira/browse/SPARK-28005 This is fixed in spark 3.0.请参阅此问题https://issues.apache.org/jira/browse/SPARK-28005这在 spark 3.0 中已修复。 It seems that this happens because there are no active executers.
发生这种情况似乎是因为没有活跃的执行者。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.