简体   繁体   English

如何在Hive中提取一个JSON的值

[英]How to extract a JSON value in Hive

I Have a JSON string that is stored in a single cell in the DB corresponding to a parent ID我有一个 JSON 字符串,它存储在数据库中与父 ID 对应的单个单元格中

{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2019-03-24T15:19:52.639Z","profileType":"ADULT","id":"abc","signupDeviceId":"1"}||{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2021-09-05T07:47:00.245Z","imageId":"19","profileType":"KIDS","name":"Kids","id":"efg","signupDeviceId":"1"}

Now I want to use the above JSON to extract the id from this.现在我想使用上面的 JSON 从中提取 id。 Let say we have data like假设我们有这样的数据

Parent ID  |  Profile JSON
1          |  {profile_json} (see above string)

I want the output to look like this我希望 output 看起来像这样

Parent ID  |  ID
1          |  abc
1          |  efg

Now, I've tried a couple of iterations to solve this现在,我尝试了几次迭代来解决这个问题

First Approach:第一种方法:

select
    get_json_object(p.profile, '$$.id') as id,
    test.parent_id
    
    from (
        select split(
                regexp_replace(
                    regexp_extract(profiles, '^\\[(.+)\\]$$',1),
                '\\}\\,\\{', '\\}\\|\\|\\{'),
                '\\|\\|') as profile_list,
                parent_id ,
                
        from source_table) test
        lateral view explode(test.profile_list) p as profile
)

But this is returning the id column as having NULL values.但这返回的id列具有 NULL 个值。 Is there something I'm missing here.我在这里缺少什么吗?

Second Approach:第二种方法:

with profiles as(
  select        regexp_replace(
                    regexp_extract(profiles, '^\\[(.+)\\]$$',1),
                '\\}\\,\\{', '\\}\\|\\|\\{') as profile_list,
                parent_id
                
        from source_table
)      

SELECT
  get_json_object (t1.profile_list,'$.id')
FROM profiles t1

The second approach is only returning the first id ( abc ) as per the above JSON string.第二种方法是根据上面的 JSON 字符串只返回第一个 id ( abc )。

I tried to replicate this in apache hive v4.我试图在 apache hive v4 中复制它。

Data数据

+----------------------------------------------------+------------------+
|                    data                    | parent_id  |
+----------------------------------------------------+------------------+
| {"profileState":"ACTIVE","isDefault":"true","joinedOn":"2019-03-24T15:19:52.639Z","profileType":"ADULT","id":"abc","signupDeviceId":"1"}||{"profileState":"ACTIVE","isDefault":"true","joinedOn":"2021-09-05T07:47:00.245Z","imageId":"19","profileType":"KIDS","name":"Kids","id":"efg","signupDeviceId":"1"} | 1.0              |
+----------------------------------------------------+------------------+

Sql sql

select pid,get_json_object(expl_jid,'$.id') json_id from 
(select  parent_id pid,split(data,'\\|\\|') jid  from tabl1)a 
lateral view explode(jid) exp_tab as expl_jid;

+------+----------+
| pid  | json_id  |
+------+----------+
| 1.0  | abc      |
| 1.0  | efg      |
+------+----------+

Solve this.解决这个问题。 Was using a extract $ in the First Approach在第一种方法中使用提取 $

select
get_json_object(p.profile, '$.id') as id,
test.parent_id

from (
    select split(
            regexp_replace(
                regexp_extract(profiles, '^\\[(.+)\\]$$',1),
            '\\}\\,\\{', '\\}\\|\\|\\{'),
            '\\|\\|') as profile_list,
            parent_id ,
            
    from source_table) test
    lateral view explode(test.profile_list) p as profile

) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM