I'm learning spark in Scala. I have a JSON file as follows:
[
{
"name": "ali",
"age": "13",
"phone": "09123455737",
"sex": "m"
},{
"name": "amir",
"age": "24",
"phone": "09123475737",
"sex": "m"
}
]
and there is just this code:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jsonFile = sqlContext.read.json("path-to-json-file")
I just receive corrupted_row : String
nothing else but when put every person(or objects) in single row, code works fine
How can I read from multiple lines for a JSON sqlContext in spark?
您必须自己将其读入 RDD,然后将其转换为数据集:
spark.read.json(sparkContext.wholeTextFiles(...).values)
This problem is getting caused because you have multiline json row. Although by default spark.read.json expect a row to be in a single line but this is configurable:
You can set option spark.read.json("path-to-json-file").option("multiLine", true)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.