簡體   English   中英

在雪花中的變體 json 列中展平多個名稱 arrays

[英]Flatten multiple names arrays within variant json column in snowflake

我有一個 web 刮板將數據轉儲到雪花數據庫中的變體列中。 這是創建頁面數據,然后為頁面中找到的各種表創建 json arrays。

下面是 json 類型的示例,我會發現使用足球類比:

    {
  "dom_url": "https://www.soccertables.com/european_tables",
  "event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
  "event_utc_time": "2020-05-11 09:01:14.821",
  "ip_address": "125.238.134.96",
  "table_1": [
    {
      "position": "1",
      "team_name": "Liverpool",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35"
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Man. City",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45"
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "..."
      "points": "..."
    }
  ],
  "table_2": [
    {
      "position": "1",
      "team_name": "Bayern Munich",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35"
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Bayer Leverkussen",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45"
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "..."
      "points": "..."
    }
  ],
  "referrer_url": "https://www.soccertables.com",
}

理想情況下,我希望 output 是一個平面的關系表:

table_name position team_name games_played 等... table_1 1 利物浦 29... table_1 2 人。 城市 29... table_2 1 拜仁慕尼黑 29... ....

我知道如果我只對 table_1 感興趣,我可以這樣做:

SELECT v.value:position::NUMBER POSITION
       , v.value:team_name::STRING TEAM_NAME
       , v.value:games_played::NUMBER GAMES_PLAYED
       , ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v

並且我可以對 table_2 執行相同的操作並將它們合並,但是對於 table_N 占位符可以有 N 種可能性。

我曾多次看過 LATERAL FLATTEN :

SELECT v.value:position::NUMBER POSITION
       , v.value:team_name::STRING TEAM_NAME
       , v.value:games_played::NUMBER GAMES_PLAYED
       , ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v, LATERAL FLATTEN(JSON_DATA:table_2) v2

但這會導致數據重復,並且不允許我將每個表的列全部放在一個關系結構中。

我確定我在這里缺少一些簡單的東西,但我已經到了一個地步,我認為我已經盯着這個太久了,只是看不到它。

在此先感謝,

如果您嘗試創建 table_n 數據的單個扁平視圖,以及第一級的屬性,那么這樣的事情會起作用。

WITH x AS (
SELECT '{
  "dom_url": "https://www.soccertables.com/european_tables",
  "event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
  "event_utc_time": "2020-05-11 09:01:14.821",
  "ip_address": "125.238.134.96",
  "table_1": [
    {
      "position": "1",
      "team_name": "Liverpool",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35",
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Man. City",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45",
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "...",
      "points": "..."
    }
  ],
  "table_2": [
    {
      "position": "1",
      "team_name": "Bayern Munich",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35",
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Bayer Leverkussen",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45",
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "...",
      "points": "..."
    }
  ],
  "referrer_url": "https://www.soccertables.com",
}' as var)
SELECT
  parse_json(x.var):dom_url::string,
  parse_json(x.var):event_id::string,
  parse_json(x.var):event_utc_time::string,
  parse_json(x.var):ip_address::string,
  x3.value:games_drawn::string,
  x3.value:games_lost::string,
  x3.value:games_played::string,
  x3.value:games_won::string,
  x3.value:goals_against::string,
  x3.value:goals_for::string,
  x3.value:points::string,
  x3.value:position::string,
  x3.value:team_name::string
FROM x
,LATERAL FLATTEN(parse_json(x.var)) x2
,LATERAL FLATTEN(X2.VALUE) x3;

CTE 顯然只是為了展示您提供的示例 JSON 的示例。 如果您關心哪些記錄來自哪個表,您還可以將x2.key作為元素包含在SELECT中。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM