In our logging database we store custom UI data as a serialized JSON string. I have been using lateral view json_tuple() to traverse the JSON object and extract nested values. However, I need to filter some of my query results based on whether an array of objects contains certain values or not. After doing some digging I think I need to use lateral view explode() , but I am not a HiveQL expert and I'm not sure exactly how to use this in the way I need.
EX : (simplified for clarity and brevity)
// ui_events table schema
eventDate, eventType, eventData
// serialized JSON string stored in eventData
{ foo: { bar: [{ x: 1, y: 0 }, { x: 0, y: 1 }] } }
// HiveQL query
select
eventDate,
x,
y
from ui_events
lateral view json_tuple(eventData, 'foo') as foo
lateral view json_tuple(foo, 'bar') as bar
// <-- how to select only sub-item(s) in bar where x = 0 and y = 1
where
eventType = 'custom'
and // <-- how to only return records where at least 1 `bar` item was found above?
Any help would be greatly appreciated. Thanks!
Read comments in the code. You can filter the dataset as you want:
with
my_table as(
select stack(2, '{ "foo": { "bar": [{ "x": 1, "y": 0 }, { "x": 0, "y": 1 }] } }',
'{ "foo": { } }'
) as EventData
)
select * from
(
select --get_json_object returns string, not array.
--remove outer []
--and replace delimiter between },{ with ,,,
--to be able to split array
regexp_replace(regexp_replace(get_json_object(EventData, '$.foo.bar'),'^\\[|\\]$',''),
'\\},\\{', '},,,{'
)bar
from my_table t
) s --explode array
lateral view explode (split(s.bar,',,,')) b as bar_element
--get struct elements
lateral view json_tuple(b.bar_element, 'x','y') e as x, y
Result:
s.bar b.bar_element e.x e.y
{"x":1,"y":0},,,{"x":0,"y":1} {"x":1,"y":0} 1 0
{"x":1,"y":0},,,{"x":0,"y":1} {"x":0,"y":1} 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.