简体   繁体   English

在雪花中的变体 json 列中展平多个名称 arrays

[英]Flatten multiple names arrays within variant json column in snowflake

I have a web scraper dumping data into a variant column in a Snowflake database.我有一个 web 刮板将数据转储到雪花数据库中的变体列中。 This is acraping page data as then creates json arrays for various tables found within the page.这是创建页面数据,然后为页面中找到的各种表创建 json arrays。

Here is an example of the type of json i would find using a Soccer analogy:下面是 json 类型的示例,我会发现使用足球类比:

    {
  "dom_url": "https://www.soccertables.com/european_tables",
  "event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
  "event_utc_time": "2020-05-11 09:01:14.821",
  "ip_address": "125.238.134.96",
  "table_1": [
    {
      "position": "1",
      "team_name": "Liverpool",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35"
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Man. City",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45"
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "..."
      "points": "..."
    }
  ],
  "table_2": [
    {
      "position": "1",
      "team_name": "Bayern Munich",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35"
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Bayer Leverkussen",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45"
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "..."
      "points": "..."
    }
  ],
  "referrer_url": "https://www.soccertables.com",
}

Ideally, i'd like the output of this to be a flat, relational table:理想情况下,我希望 output 是一个平面的关系表:

table_name position team_name games_played etc... table_1 1 Liverpool 29... table_1 2 Man. table_name position team_name games_played 等... table_1 1 利物浦 29... table_1 2 人。 City 29... table_2 1 Bayern Munich 29... ....城市 29... table_2 1 拜仁慕尼黑 29... ....

I know that if i were only interested in table_1 i could do this:我知道如果我只对 table_1 感兴趣,我可以这样做:

SELECT v.value:position::NUMBER POSITION
       , v.value:team_name::STRING TEAM_NAME
       , v.value:games_played::NUMBER GAMES_PLAYED
       , ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v

and that i could do the same for table_2 and union them, but there can be N possibilities with regards to the table_N placeholder.并且我可以对 table_2 执行相同的操作并将它们合并,但是对于 table_N 占位符可以有 N 种可能性。

I've looked at doing LATERAL FLATTEN multiple times:我曾多次看过 LATERAL FLATTEN :

SELECT v.value:position::NUMBER POSITION
       , v.value:team_name::STRING TEAM_NAME
       , v.value:games_played::NUMBER GAMES_PLAYED
       , ...
FROM JSON_TABLE a1, LATERAL FLATTEN(JSON_DATA:table_1) v, LATERAL FLATTEN(JSON_DATA:table_2) v2

But this results in duplication of data, and does not allow me to put each tables columns all in a single relational structure.但这会导致数据重复,并且不允许我将每个表的列全部放在一个关系结构中。

I'm sure there is something simple that i am missing here, but i've reached a point where i think i've been staring at this too long, and just can';t see it.我确定我在这里缺少一些简单的东西,但我已经到了一个地步,我认为我已经盯着这个太久了,只是看不到它。

Thanks in advance, S在此先感谢,

If you are trying to create a single, flattened view of the table_n data, as well as the attributes of at the first level, then something like this would work.如果您尝试创建 table_n 数据的单个扁平视图,以及第一级的属性,那么这样的事情会起作用。

WITH x AS (
SELECT '{
  "dom_url": "https://www.soccertables.com/european_tables",
  "event_id": "01b2722a-d8e6-4f67-95d0-8dd7ba088a4a",
  "event_utc_time": "2020-05-11 09:01:14.821",
  "ip_address": "125.238.134.96",
  "table_1": [
    {
      "position": "1",
      "team_name": "Liverpool",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35",
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Man. City",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45",
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "...",
      "points": "..."
    }
  ],
  "table_2": [
    {
      "position": "1",
      "team_name": "Bayern Munich",
      "games_played": "29",
      "games_won": "26",
      "games_drawn": "2",
      "games_lost": "1",
      "goals_for": "75",
      "goals_against": "35",
      "points": "80"
    },
    {
      "position": "2",
      "team_name": "Bayer Leverkussen",
      "games_played": "29",
      "games_won": "20",
      "games_drawn": "5",
      "games_lost": "4",
      "goals_for": "60",
      "goals_against": "45",
      "points": "65"
    },
    {
      "position": "...",
      "team_name": "...",
      "games_played": "...",
      "games_won": "...",
      "games_drawn": "...",
      "games_lost": "...",
      "goals_for": "...",
      "goals_against": "...",
      "points": "..."
    }
  ],
  "referrer_url": "https://www.soccertables.com",
}' as var)
SELECT
  parse_json(x.var):dom_url::string,
  parse_json(x.var):event_id::string,
  parse_json(x.var):event_utc_time::string,
  parse_json(x.var):ip_address::string,
  x3.value:games_drawn::string,
  x3.value:games_lost::string,
  x3.value:games_played::string,
  x3.value:games_won::string,
  x3.value:goals_against::string,
  x3.value:goals_for::string,
  x3.value:points::string,
  x3.value:position::string,
  x3.value:team_name::string
FROM x
,LATERAL FLATTEN(parse_json(x.var)) x2
,LATERAL FLATTEN(X2.VALUE) x3;

The CTE is obviously just to show the example with the sample JSON you provided. CTE 显然只是为了展示您提供的示例 JSON 的示例。 If you care about which records came from which table, you can also include x2.key as an element in your SELECT .如果您关心哪些记录来自哪个表,您还可以将x2.key作为元素包含在SELECT中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM