Spark structured streaming job stuck for hours without getting killed

Question

I have a structured streaming job which reads from kafka, perform aggregations and write to hdfs. The job is running in cluster mode in yarn. I am using spark2.4. Every 2-3 days this job gets stuck. It doesn't fail but gets stuck at some microbatch microbatch. The microbatch doesn't even tend to start. The driver keeps printing following log multiple times for hours.

 Got an error when resolving hostNames. Falling back to /default-rack for all.

When I kill the streaming job and start again, the job again starts running fine. How to fix this ?

Answer 1

See this issue https://issues.apache.org/jira/browse/SPARK-28005 This is fixed in spark 3.0. It seems that this happens because there are no active executers.

Spark structured streaming job stuck for hours without getting killed

Question

1 answers

solution1
0 2020-10-03 10:26:56

Spark structured streaming job stuck for hours without getting killed

Question

1 answers

solution1 0 2020-10-03 10:26:56

solution1
0 2020-10-03 10:26:56