Amazon Redshift 使用 COPY 命令仅从 JSON 获取 1 行

Question

I'm trying to copy into Redshift JSON file with GA data with the following structure:我正在尝试将具有以下结构的 GA 数据复制到 Redshift JSON 文件中：

{
  "reports": [
    {
      "columnHeader": {
        "dimensions": [
          "ga:date",
          "ga:country"
        ],
        "metricHeader": {
          "metricHeaderEntries": [
            {
              "name": "ga:users",
              "type": "INTEGER"
            },
            {
              "name": "ga:newUsers",
              "type": "INTEGER"
            },
            {
              "name": "ga:sessionduration",
              "type": "TIME"
            }
          ]
        }
      },
      "data": {
        "rows": [
          {
            "dimensions": [
              "20200731",
              "(not set)"
            ],
            "metrics": [
              {
                "values": [
                  "7",
                  "6",
                  "5.0"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20200731",
              "Albania"
            ],
            "metrics": [
              {
                "values": [
                  "2",
                  "1",
                  "0.0"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20200731",
              "Algeria"
            ],
            "metrics": [
              {
                "values": [
                  "1",
                  "1",
                  "224.0"
                ]
              }
            ]
          },

If I use the following JSON path, it only loads the first object in the array:如果我使用以下 JSON 路径，它只会加载数组中的第一个 object：

{
    "jsonpaths": [
       "$['reports'][0]['data']['rows'][0]['dimensions'][0]",
       "$['reports'][0]['data']['rows'][0]['dimensions'][1]",
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][0]" ,
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][1]" ,
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][2]" 
    ]
}

I tried to change ['rows'][0] to ['rows'][*] and ['rows'] but nothing helped.我试图将 ['rows'][0] 更改为 ['rows'][*] 和 ['rows'] 但没有任何帮助。 How should I change JSON paths to load all data from the file?我应该如何更改 JSON 路径以从文件加载所有数据？

Thanks!谢谢！

Answer 1

Redshift is only seeing the first row because of the index you specified [0] (the first item).由于您指定的索引[0] （第一项），Redshift 只看到第一行。 You cannot currently directly ingest JSON that is nested like this.您目前无法直接摄取像这样嵌套的 JSON。

To access all of the nested arrays in this file you will need to define it as an external table and then use the nested data query syntax.要访问此文件中的所有嵌套 arrays，您需要将其定义为外部表，然后使用嵌套数据查询语法。

Start here: "Tutorial: Querying nested data with Amazon Redshift Spectrum"从这里开始： “教程：使用 Amazon Redshift Spectrum 查询嵌套数据”

Amazon Redshift 使用 COPY 命令仅从 JSON 获取 1 行

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-04 14:51:40

Amazon Redshift 使用 COPY 命令仅从 JSON 获取 1 行

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-04 14:51:40

解决方案1
0 已采纳 2020-08-04 14:51:40