繁体   English   中英

如何从 BigQuery 中的平面表创建嵌套的 JSON 格式表?

[英]How to create nested JSON format table from flat table in BigQuery?

我有一个宽平面表,以以下类似格式存储在 Google bigquery 中:

log_date:integer,sessionid:integer,computer:string,ip:string,event_id:integer,amount:float

我正在尝试以分层嵌套格式创建此表,具有 2 个嵌套级别,如下所示:

 [
  {
    "name": "log_date",
    "type": "integer"
  }, 
  {
    "name": "session",
    "type": "record",
    "mode": "repeated",
    "fields": [                 
     {
       "name": "sessionid",
       "type": "integer"
         },
     {
       "name": "computer",
       "type": "string"
        },
        {
       "name": "ip",
       "type": "string"
        },
        {
    "name": "event",
    "type": "record",
    "mode": "repeated",
    "fields": [
    {
       "name": "event_id",
       "type": "integer"
     },
     {
       "name": "amount",
       "type": "float"
     }]] } ]

从 bigquery 表生成 json 格式的数据文件的最佳方法是什么? 是否有比 1. 将表下载到外部 csv 2. 构建 json 记录,并将其写入外部文件 3. 将外部 json 文件上传到新的 bigquery 表中的不同且更快的方法

我们可以有一个从现有表生成 json 的直接过程吗?

谢谢你

目前没有一种方法可以自动将数据转换为嵌套格式。 如果您想以 json 格式而不是 CSV 格式获取数据,您可以使用导出命令并将--destination_format标志设置为NEWLINE_DELIMITED_JSON 例如

bq extract \
    --destination_format=NEWLINE_DELIMITED_JSON \
    yourdataset.table \
    gs://your_bucket/result*.json 

这可以通过标准 SQL 中的array_agg完成。

请注意,如果您想嵌套在层中,则需要有公共表表达式,因为array_agg不能直接包含另一个array_agg

WITH DATA AS (
 SELECT 1 AS log_date, 10 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 100 AS event_id, 1 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 101 AS event_id, 2 AS amount
 UNION ALL SELECT 1 AS log_date, 11 AS sessionid, 'b' AS computer, '1.2.3.5' AS ip, 102 AS event_id, 3 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 200 AS event_id, 4 AS amount
 UNION ALL SELECT 2 AS log_date, 20 AS sessionid, 'a' AS computer, '1.2.3.4' AS ip, 201 AS event_id, 5 AS amount
 UNION ALL SELECT 2 AS log_date, 21 AS sessionid, 'c' AS computer, '1.2.3.6' AS ip, 202 AS event_id, 6 AS amount ),
inner_Aggregate AS (
  SELECT
    log_date,
    sessionid,
    computer,
    ip,
    ARRAY_AGG(STRUCT(event_id, amount)) AS event
  FROM
    DATA
  GROUP BY
    log_date,
    sessionid,
    computer,
    ip )
SELECT
  log_date,
  ARRAY_AGG(STRUCT(sessionid, computer, ip, event )) AS session
FROM
  inner_Aggregate
GROUP BY
  log_date

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM