简体   繁体   中英

AWS EMR notebook Spark kernel infinitely loads small JSON file

I am trying to load a JSON file in an EMR notebook with a Spark kernel. I am using a very large, proven EMR cluster that I have worked with before, so the cluster size/computation power is not the issue. The simple code below is enough to reproduce my issue:

val df = spark.read.json("s3a://src/main/resources/zipcodes.json")

Here is the JSON file I am trying to load. It is extremely small. https://raw.githubusercontent.com/spark-examples/spark-scala-examples/71d2db89ffb24db6f01eb1fa12286bfbb37c44c4/src/main/resources/zipcodes.json

I let it run for 1 hour. In the bottom left corner, it says: Spark | Busy Spark | Busy and the circle in the top right is full, indicating that the kernel is working. However, the Spark Job Progress shows a Task Progress bar that never progresses. Any advice?

The problem was not the JSON file. In an attempt to fix this issue, I merely cloned my problematic EMR cluster with the exact same steps/configuration, attached my EMR notebook to the clone and re-attempted the exact same code with the exact same file. It worked nearly instantaneously. The problem was with the original cluster although I do not know what the exact problem was.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM