简体   繁体   English

使用 Snowflake SQL 将字符串解析为 JSON

[英]Parse string as JSON with Snowflake SQL

I have a field in a table of our db that works like an event-like payload, where all changes to different entities are gathered.我在我们的数据库表中有一个字段,它的工作方式类似于类似事件的有效负载,其中收集了对不同实体的所有更改。 See example below for a single field of the object:请参阅下面的示例,了解 object 的单个字段:

'---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc'

Since accessing this field with pure SQL is a pain, I was thinking of parsing it as a JSON so that it would look like this:由于使用纯 SQL 访问此字段很痛苦,我正在考虑将其解析为 JSON 以便它看起来像这样:

{
  "field_one":"1", 
  "field_two": "20", 
  "field_three": "4", 
  "id": "1234",
  "another_id": "5678",
  "some_text": "Hey you",
  "a_date": "2022-11-29",
  "utc": "2022-11-29 15:29:28.159296000 Z",
  "another_date": "2022-11-30",
  "utc": "2022-11-30 13:34:59.000000000 Z"
}

And then just use a Snowflake-native approach to access the values I need.然后只需使用 Snowflake 原生方法来访问我需要的值。

As you can see, though, there are two fields that are called utc , since one is referring to the first date ( a_date ), and the second one is referring to the second date ( another_date) .不过,如您所见,有两个字段称为utc ,因为一个字段指的是第一个日期 ( a_date ),第二个字段指的是第二个日期 ( another_date) I believe these are nested in the object, but it's difficult to assess with the format of the field.我相信这些嵌套在 object 中,但很难用字段的格式进行评估。

This is a problem since I can't differentiate between one utc and another when giving the string the format I need and running a parse_json() function (due to both keys using the same name).这是一个问题,因为在为字符串提供我需要的格式并运行parse_json() function(由于两个键使用相同的名称)时,我无法区分一个utc和另一个。

My SQL so far looks like the following:到目前为止,我的 SQL 如下所示:

select
    object,
    replace(object, '---\n', '{"') || '"}' as first,
    replace(first, '\n', '","') as second_,
    replace(second_, ': ', '":"') as third,
    replace(third, '    ', '') as fourth,
    replace(fourth, '  ', '') as last
from my_table

(Steps third and fourth are needed because I have some fields that have extra spaces in them) (需要第三步和第四步,因为我有一些字段中有额外的空格)

And this actually gives me the format I need, but due to what I mentioned around the utc keys, I cannot parse the string as a JSON.这实际上给了我需要的格式,但由于我在utc键周围提到的内容,我无法将字符串解析为 JSON。

Also note that the structure of the string might change from row to row, meaning that some rows might gather two utc keys, while others might have one, and others even five.另请注意,字符串的结构可能会因行而异,这意味着某些行可能收集两个utc键,而其他行可能有一个,而其他行甚至有五个。

Any ideas on how to overcome that?关于如何克服它的任何想法?

Replace only one occurrence with regexp_replace() :只用regexp_replace()替换一次:

with data as (
    select '---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc' o
)

select parse_json(last2)
from (
    select o,
        replace(o, '---\n', '{"') || '"}' as first,
        replace(first, '\n', '","') as second_,
        replace(second_, ': ', '":"') as third,
        replace(third, '    ', '') as fourth,
        replace(fourth, '  ', '') as last,
        regexp_replace(last, '"utc"', '"utc2"', 1, 2) last2
    from data
)
;

在此处输入图像描述

This may not be what you want but it seems to me that your problem could be solved if the UTC timestamps were to replace the dates preceding it where the keys are not duplicated.这可能不是你想要的,但在我看来,如果 UTC 时间戳替换它之前的密钥不重复的日期,你的问题就可以解决。 You can always calculate dates once you have the timestamps.一旦有了时间戳,您就可以随时计算日期。 If this is making sense, see if you can apply your parse_json solution to this output instead如果这是有道理的,看看你是否可以将你的parse_json解决方案应用于这个 output

set str='---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: 2022-11-29 15:29:28.159296000 Z\nanother_date: 2022-11-30\nutc: 2022-11-30 13:34:59.000000000 Z';

               
select regexp_replace($str,'[0-9]{4}-[0-9]{2}-[0-9]{2}\nutc:')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM