简体   繁体   中英

Loading nested array into bigquery from public google cloud dataset

I'm trying to load a public dataset from Google Cloud into BigQuery ( quickdraw_dataset ). The data is in JSON format as below:

 { 
    "key_id":"5891796615823360",
    "word":"nose",
    "countrycode":"AE",
    "timestamp":"2017-03-01 20:41:36.70725 UTC",
    "recognized":true,
    "drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
  }

The issue that I'm running into is that the "drawing" field is a nested array. I gather from reading other posts that you can't read arrays into BigQuery? This post suggests that one way around this issue is to read in the array as a string. But, when I use the following schema, I get this error: `

 [
    {
        "name": "key_id",
        "type": "STRING"
    },
    {
        "name": "word",
        "type": "STRING"
    },
    {
        "name": "countrycode",
        "type": "STRING"
    },
    {
        "name": "timestamp",
        "type": "STRING"
    },
    {
        "name": "recognized",
        "type": "BOOLEAN"
    },
    {
        "name": "drawing",
        "type": "STRING"

    }
]

Error while reading data, error message: JSON parsing error in row starting at position 0: Array specified for non-repeated field: drawing.

Is there a way to read this dataset into BigQuery?

Thanks in advance!

Load the whole row as a CSV, then parse inside BigQuery.

Load:

bq load --F \\t temp.eraser gs://quickdraw_dataset/full/simplified/eraser.ndjson row

Query:

SELECT JSON_EXTRACT_SCALAR(row, '$.countrycode') a
  , JSON_EXTRACT_SCALAR(row, '$.word') b
  , JSON_EXTRACT_ARRAY(row, '$.drawing')[OFFSET(0)] c
FROM temp.eraser

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM