[英]Flatten data source in Snowflake from Array
I am trying to fix an array in a dataset.我正在尝试修复数据集中的数组。 Currently, I have a data set that has a reference number to multiple different uuids.
目前,我有一个数据集,其中包含多个不同 uuid 的参考编号。 What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid.
我想要做的是在 Snowflake 中将其展平以使其参考编号为每个 uuid 具有单独的行。 For example
例如
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[
""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"",
""df662812-7f97-0b43-9d3e-12f64f504fbb"",
""08644a69-76ed-ce2d-afff-b236a22efa69"",
""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"",
]"
Should end up looking like:最终应该看起来像:
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 05554f65-6aa9-4dd1-6271-8ce2d60f10c4
2) 9f823c2a-ced5-4dbe-be65-869311462f75 df662812-7f97-0b43-9d3e-12f64f504fbb
3) 9f823c2a-ced5-4dbe-be65-869311462f75 08644a69-76ed-ce2d-afff-b236a22efa69
4) 9f823c2a-ced5-4dbe-be65-869311462f75 f1162c2e-eeb5-83f6-5307-2ed644e6b9eb
I just started working in Snowflake so I am new to it.我刚开始在 Snowflake 工作,所以我是新手。 It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it.
看起来有一个横向扁平,但这要么不能告诉我我有各种各样的错误。 The documentation from snowflake is a bit perplexing when it comes to this.
雪花的文档在这方面有点令人困惑。
While FLATTEN
is the right approach when exploding an array, the UUID
column value shown in the original description is invalid if interpreted as JSON syntax : "[""val1"", ""val2""]"
and that'll need correction before a LATERAL FLATTEN
approach can be applied by treating it as a VARIANT
type.虽然
FLATTEN
是FLATTEN
数组时的正确方法,但原始描述中显示的UUID
列值如果解释为 JSON 语法则无效: "[""val1"", ""val2""]"
并且在此之前需要更正可以通过将其视为VARIANT
类型来应用LATERAL FLATTEN
方法。
If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:如果原始描述中的数据样本是文字样本并适用于所有列值,则以下查询将帮助将其转换为有效的 JSON 语法,然后应用横向展平以产生所需的结果:
SELECT
T.REFERENCE,
X.VALUE AS UUID
FROM (
SELECT
REFERENCE,
-- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"
-- to valid JSON: ["a", "b"] by stripping away unnecessary quotes
PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED
FROM TABLENAME) T,
LATERAL FLATTEN(T.UUID_ARR_CLEANED) X
If your data is already in a valid VARIANT
type with a successful PARSE_JSON
done for the UUID
column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:如果您的数据已经是有效的
VARIANT
类型,并且在摄取期间为UUID
列成功完成了PARSE_JSON
,并且描述中提供的示例只是格式问题,仅在帖子中显示 JSON 无效,那么更简单的版本与上述相同的查询就足够了:
SELECT REFERENCE, X.VALUE AS UUID
FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.