简体   繁体   English

使用 Scala/Spark 读取 Json 文件

[英]Read Json file with Scala/Spark

I am trying to read a json, example:我正在尝试读取 json,例如:

{
  "id1": {
    "a": "7",
    "b": "3",
    "c": "10",
    "d": "10",
    "e": "15",
    "f": "11",
    "g": "2",
    "h": "7",
    "i": "5",
    "j": "14"
  },
  "id2": {
    "a": "3",
    "b": "7",
    "c": "12",
    "d": "4",
    "e": "10",
    "f": "4",
    "g": "13",
    "h": "4",
    "i": "1",
    "j": "13"
  },
  "id3": {
    "a": "10",
    "b": "6",
    "c": "1",
    "d": "1",
    "e": "13",
    "f": "12",
    "g": "9",
    "h": "6",
    "i": "7",
    "j": "4"
  }
}

when I process it with spark.read.json ("file.json") it returns a single record with this format:当我用 spark.read.json ("file.json") 处理它时,它返回一个具有以下格式的记录:

+-----------------------------------+---------------------------------------+---------------------------------+
|id1                                |id2                                    |id3                              |
+-----------------------------------+---------------------------------------+---------------------------------+
|{7, 3, 10, 10, 15, 11, 2, 7, 5, 14}|{322539, 7, 12, 4, 10, 4, 13, 4, 1, 13}|{10, 6, 1, 1, 13, 12, 9, 6, 7, 4}|
+-----------------------------------+---------------------------------------+---------------------------------+

I would like the result to be something like this when processing the file:我希望在处理文件时结果是这样的:

+-----+------+------+------+------+------+------+------+------+------+-------+
| id  | col1 | col2 | col3 | col4 | col5 | col6 | col7 | col8 | col9 | col10 |
+-----+------+------+------+------+------+------+------+------+------+-------+
| id1 |  7   |  3   |  10  |  10  |  15  |  11  |  2   |  7   |  5   |  14   |
+-----+------+------+------+------+------+------+------+------+------+-------+
| id2 |  3   |  7   |  12  |  4   |  10  |  4   |  13  |  4   |  1   |  13   |
+-----+------+------+------+------+------+------+------+------+------+-------+
| id3 |  10  |  6   |  1   |  1   |  13  |  12  |  9   |  6   |  7   |  4    |
+-----+------+------+------+------+------+------+------+------+------+-------+

some simple and fast way?一些简单快捷的方法?

thanks谢谢

Are you able to modify your input JSON file ?你能修改你的输入 JSON 文件吗? If so, making it an array of JS Objects would do the job with spark.read.json() :如果是这样,将其设为 JS 对象数组将使用 spark.read.json() 完成这项工作:

[    
    {
      "id": "id1",  
      "a": "7",
      "b": "3",
      "c": "10",
      "d": "10",
      "e": "15",
      "f": "11",
      "g": "2",
      "h": "7",
      "i": "5",
      "j": "14"
    },
    ...
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM