I have some large json files in a specific s3 bucket folder. Each file contains json objects per line. I tried to downloading it using spark.read.json(s3a://bucket/prefix/file.json) but got "Premature end of Content-Length delimited message body" error.
I am using spark 2.4.7 with Hadoop distribution 2.7.1, java 1.8 and python 3.7
Try this:
spark.read.option(
"multiLine", true
).option(
"mode", "PERMISSIVE"
).json("/path/file.json")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.