简体   繁体   English

Amazon Redshift 使用 COPY 命令仅从 JSON 获取 1 行

[英]Amazon Redshift gets only 1 row from JSON with COPY command

I'm trying to copy into Redshift JSON file with GA data with the following structure:我正在尝试将具有以下结构的 GA 数据复制到 Redshift JSON 文件中:

{
  "reports": [
    {
      "columnHeader": {
        "dimensions": [
          "ga:date",
          "ga:country"
        ],
        "metricHeader": {
          "metricHeaderEntries": [
            {
              "name": "ga:users",
              "type": "INTEGER"
            },
            {
              "name": "ga:newUsers",
              "type": "INTEGER"
            },
            {
              "name": "ga:sessionduration",
              "type": "TIME"
            }
          ]
        }
      },
      "data": {
        "rows": [
          {
            "dimensions": [
              "20200731",
              "(not set)"
            ],
            "metrics": [
              {
                "values": [
                  "7",
                  "6",
                  "5.0"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20200731",
              "Albania"
            ],
            "metrics": [
              {
                "values": [
                  "2",
                  "1",
                  "0.0"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20200731",
              "Algeria"
            ],
            "metrics": [
              {
                "values": [
                  "1",
                  "1",
                  "224.0"
                ]
              }
            ]
          },

If I use the following JSON path, it only loads the first object in the array:如果我使用以下 JSON 路径,它只会加载数组中的第一个 object:

{
    "jsonpaths": [
       "$['reports'][0]['data']['rows'][0]['dimensions'][0]",
       "$['reports'][0]['data']['rows'][0]['dimensions'][1]",
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][0]" ,
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][1]" ,
       "$['reports'][0]['data']['rows'][0]['metrics'][0]['values'][2]" 
    ]
}

I tried to change ['rows'][0] to ['rows'][*] and ['rows'] but nothing helped.我试图将 ['rows'][0] 更改为 ['rows'][*] 和 ['rows'] 但没有任何帮助。 How should I change JSON paths to load all data from the file?我应该如何更改 JSON 路径以从文件加载所有数据?

Thanks!谢谢!

Redshift is only seeing the first row because of the index you specified [0] (the first item).由于您指定的索引[0] (第一项),Redshift 只看到第一行。 You cannot currently directly ingest JSON that is nested like this.目前无法直接摄取像这样嵌套的 JSON。

To access all of the nested arrays in this file you will need to define it as an external table and then use the nested data query syntax.要访问此文件中的所有嵌套 arrays,您需要将其定义为外部表,然后使用嵌套数据查询语法。

Start here: "Tutorial: Querying nested data with Amazon Redshift Spectrum"从这里开始: “教程:使用 Amazon Redshift Spectrum 查询嵌套数据”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM