简体   繁体   English

从spark scala中的多行文件读取JSON文件

[英]Read JSON files from multiple line file in spark scala

I'm learning spark in Scala.我正在 Scala 中学习火花。 I have a JSON file as follows:我有一个 JSON 文件,如下所示:

[
  {
    "name": "ali",
    "age": "13",
    "phone": "09123455737",
    "sex": "m"
  },{
    "name": "amir",
    "age": "24",
    "phone": "09123475737",
    "sex": "m"
  }
]

and there is just this code:只有这个代码:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jsonFile = sqlContext.read.json("path-to-json-file")

I just receive corrupted_row : String nothing else but when put every person(or objects) in single row, code works fine我只收到了corrupted_row : String没有别的,但是当把每个人(或对象)放在单行中时,代码工作正常

How can I read from multiple lines for a JSON sqlContext in spark?如何在 spark 中从多行读取 JSON sqlContext?

您必须自己将其读入 RDD,然后将其转换为数据集:

spark.read.json(sparkContext.wholeTextFiles(...).values)          

This problem is getting caused because you have multiline json row.由于您有多行 json 行,因此导致此问题。 Although by default spark.read.json expect a row to be in a single line but this is configurable:虽然默认情况下 spark.read.json 期望一行在一行中,但这是可配置的:

You can set option spark.read.json("path-to-json-file").option("multiLine", true)您可以设置选项spark.read.json("path-to-json-file").option("multiLine", true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM