简体   繁体   English

从数组扁平化雪花中的数据源

[英]Flatten data source in Snowflake from Array

I am trying to fix an array in a dataset.我正在尝试修复数据集中的数组。 Currently, I have a data set that has a reference number to multiple different uuids.目前,我有一个数据集,其中包含多个不同 uuid 的参考编号。 What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid.我想要做的是在 Snowflake 中将其展平以使其参考编号为每个 uuid 具有单独的行。 For example例如

Reference                                       UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[
                                         ""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"",
                                         ""df662812-7f97-0b43-9d3e-12f64f504fbb"",
                                          ""08644a69-76ed-ce2d-afff-b236a22efa69"",
                                          ""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"",
                                            ]"

Should end up looking like:最终应该看起来像:

Reference                                UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75    05554f65-6aa9-4dd1-6271-8ce2d60f10c4
2) 9f823c2a-ced5-4dbe-be65-869311462f75    df662812-7f97-0b43-9d3e-12f64f504fbb
3) 9f823c2a-ced5-4dbe-be65-869311462f75    08644a69-76ed-ce2d-afff-b236a22efa69
4) 9f823c2a-ced5-4dbe-be65-869311462f75    f1162c2e-eeb5-83f6-5307-2ed644e6b9eb

I just started working in Snowflake so I am new to it.我刚开始在 Snowflake 工作,所以我是新手。 It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it.看起来有一个横向扁平,但这要么不能告诉我我有各种各样的错误。 The documentation from snowflake is a bit perplexing when it comes to this.雪花的文档在这方面有点令人困惑。

While FLATTEN is the right approach when exploding an array, the UUID column value shown in the original description is invalid if interpreted as JSON syntax : "[""val1"", ""val2""]" and that'll need correction before a LATERAL FLATTEN approach can be applied by treating it as a VARIANT type.虽然FLATTENFLATTEN数组时的正确方法,但原始描述中显示的UUID列值如果解释为 JSON 语法则无效: "[""val1"", ""val2""]"并且在此之前需要更正可以通过将其视为VARIANT类型来应用LATERAL FLATTEN方法。

If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:如果原始描述中的数据样本是文字样本并适用于所有列值,则以下查询将帮助将其转换为有效的 JSON 语法,然后应用横向展平以产生所需的结果:

SELECT
  T.REFERENCE,
  X.VALUE AS UUID
FROM (
  SELECT
    REFERENCE,
    -- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"
    -- to valid JSON: ["a", "b"] by stripping away unnecessary quotes
    PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED
    FROM TABLENAME) T,
  LATERAL FLATTEN(T.UUID_ARR_CLEANED) X

If your data is already in a valid VARIANT type with a successful PARSE_JSON done for the UUID column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:如果您的数据已经是有效的VARIANT类型,并且在摄取期间为UUID列成功完成了PARSE_JSON ,并且描述中提供的示例只是格式问题,仅在帖子中显示 JSON 无效,那么更简单的版本与上述相同的查询就足够了:

SELECT REFERENCE, X.VALUE AS UUID
FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM