简体   繁体   English

如何在 HIVE 中展平 json 结构?

[英]How to flatten a json structure in HIVE?

I have a JSON structure that looks like this (there is aa structure nested in a column):我有一个看起来像这样的 JSON 结构(一列中嵌套了一个结构):

HEADER:标题:

{"user":{"location":"USA","id":1514008171,"name":"Auzzie Jet","screenname":"metalheadgrunge","geoenabled":false},"tweetmessage":"Anthrax - Black - Sunglasses hell","createddate":"2013-06-20T12:08:44","identifier":"1234","geolocation":null}

When I query this, this works:当我查询时,这有效:

SELECT *
FROM TBL a
WHERE header.identifier = '1234'

But then when I want to find the location (the nested json structure) it does not work:但是当我想找到位置(嵌套的 json 结构)时,它不起作用:

SELECT *
FROM TBL a
WHERE header.identifier = '1234'
and a.header.user.location LIKE '%USA%'

Does anyone know how to query this in HIVE?有谁知道如何在 HIVE 中查询这个?

For flattening json structure you need to first create lateral view using json_tuple that's how you can be able to achieve what you intend to.对于展平 json 结构,您需要首先使用json_tuple创建lateral view ,这样您就可以实现您的意图。 Please find the complete solution how to do it.请找到如何操作的完整解决方案。

Step 1 : Create external table tweets with single column tweet with data type of string.步骤 1 :使用数据类型为字符串的单列推文创建外部表推文。

CREATE EXTERNAL table tweets (tweet string);

Now put the json string {"user":{"location":"USA","id":1514008171,"name":"Auzzie Jet","screenname":"metalheadgrunge","geoenabled":false},"tweetmessage":"Anthrax - Black - Sunglasses hell","createddate":"2013-06-20T12:08:44","identifier":"1234","geolocation":null}现在把 json 字符串{"user":{"location":"USA","id":1514008171,"name":"Auzzie Jet","screenname":"metalheadgrunge","geoenabled":false},"tweetmessage":"Anthrax - Black - Sunglasses hell","createddate":"2013-06-20T12:08:44","identifier":"1234","geolocation":null}

Step 2 : In text file named tweets.txt and run the below command to load the data from text file into hive table.第 2 步:在名为 tweets.txt 的文本文件中并运行以下命令将文本文件中的数据加载到 hive 表中。

LOAD data local inpath 'tweets.txt' into table tweets;

Once done, now we are ready to play on our Json string.完成后,现在我们准备好播放我们的 Json 字符串了。

So basically here what we are trying to achieve is querying on identifier and location fields which are basically at different levels.所以基本上在这里我们试图实现的是查询基本上处于不同级别的标识符和位置字段。

user
    location    :   USA
    id          :   1514008171
    name        :   Auzzie Jet
    screenname  :   metalheadgrunge
    geoenabled  :   false
tweetmessage    :   Anthrax - Black - Sunglasses hell
createddate :   2013-06-20T12:08:44
identifier  :   1234
geolocation :   null

Level 1 fields are => user , tweetmessage , createddate , identifier , geolocation Level 2 Fields are => location , id , name , screenname , geoenabled 1 级字段是 => usertweetmessagecreateddateidentifiergeolocation 2 级字段是 => locationidnamescreennamegeoenabled

So firstly we need to create lateral View on Level 1 so that we can query on Level 1 fields.所以首先我们需要在 Level 1 上创建横向视图,以便我们可以查询 Level 1 字段。 In our example we need to query on identifier .在我们的示例中,我们需要查询identifier Also to query on Level 2 fields we need to explode our User view which would be possible by lateral view.同样要查询 2 级字段,我们需要展开我们的用户视图,这可以通过横向视图实现。

LATERAL VIEW json_tuple(t.tweet, 'user', 'identifier' ) t1 as `user`, `identifier`

and then to query on location , we need to create another lateral view for level 2 fields.然后要查询location ,我们需要为 2 级字段创建另一个横向视图。

LATERAL VIEW json_tuple(t1.`user`,'name', 'location') t2 as `name`, `location`

and that's it finally we can use select on tweets with lateral views.就是这样,我们终于可以在具有横向视图的推文上使用选择。

Step 3 and Final Query :第 3 步和最终查询

SELECT t.* FROM tweets t LATERAL VIEW json_tuple(t.tweet, 'user', 'identifier' ) t1 as `user`, `identifier` LATERAL VIEW json_tuple(t1.`user`,'name', 'location') t2 as `name`, `location` where t1.`identifier`=1234 and `location` ="USA";

For more read on lateral view : LateralView and on Json_Tuple : JsonTuple有关横向视图的更多信息: LateralView和 Json_Tuple : JsonTuple

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM