I am trying to load a JSON file in an EMR notebook with a Spark kernel. I am using a very large, proven EMR cluster that I have worked with before, so the cluster size/computation power is not the issue. The simple code below is enough to reproduce my issue:
val df = spark.read.json("s3a://src/main/resources/zipcodes.json")
Here is the JSON file I am trying to load. It is extremely small. https://raw.githubusercontent.com/spark-examples/spark-scala-examples/71d2db89ffb24db6f01eb1fa12286bfbb37c44c4/src/main/resources/zipcodes.json
I let it run for 1 hour. In the bottom left corner, it says: Spark | Busy
Spark | Busy
and the circle in the top right is full, indicating that the kernel is working. However, the Spark Job Progress
shows a Task Progress
bar that never progresses. Any advice?
The problem was not the JSON file. In an attempt to fix this issue, I merely cloned my problematic EMR cluster with the exact same steps/configuration, attached my EMR notebook to the clone and re-attempted the exact same code with the exact same file. It worked nearly instantaneously. The problem was with the original cluster although I do not know what the exact problem was.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.