[英]HiveQL: How to write a query to select and filter records based on nested JSON array values
在我们的日志数据库中,我们将自定义 UI 数据存储为序列化的 JSON 字符串。 我一直在使用横向视图 json_tuple()来遍历 JSON object 并提取嵌套值。 但是,我需要根据对象数组是否包含某些值来过滤我的一些查询结果。 在做了一些挖掘之后,我认为我需要使用横向视图 explode() ,但我不是HiveQL专家,我不确定如何以我需要的方式使用它。
EX :(为了清晰和简洁而简化)
// ui_events table schema
eventDate, eventType, eventData
// serialized JSON string stored in eventData
{ foo: { bar: [{ x: 1, y: 0 }, { x: 0, y: 1 }] } }
// HiveQL query
select
eventDate,
x,
y
from ui_events
lateral view json_tuple(eventData, 'foo') as foo
lateral view json_tuple(foo, 'bar') as bar
// <-- how to select only sub-item(s) in bar where x = 0 and y = 1
where
eventType = 'custom'
and // <-- how to only return records where at least 1 `bar` item was found above?
任何帮助将不胜感激。 谢谢!
阅读代码中的注释。 您可以根据需要过滤数据集:
with
my_table as(
select stack(2, '{ "foo": { "bar": [{ "x": 1, "y": 0 }, { "x": 0, "y": 1 }] } }',
'{ "foo": { } }'
) as EventData
)
select * from
(
select --get_json_object returns string, not array.
--remove outer []
--and replace delimiter between },{ with ,,,
--to be able to split array
regexp_replace(regexp_replace(get_json_object(EventData, '$.foo.bar'),'^\\[|\\]$',''),
'\\},\\{', '},,,{'
)bar
from my_table t
) s --explode array
lateral view explode (split(s.bar,',,,')) b as bar_element
--get struct elements
lateral view json_tuple(b.bar_element, 'x','y') e as x, y
结果:
s.bar b.bar_element e.x e.y
{"x":1,"y":0},,,{"x":0,"y":1} {"x":1,"y":0} 1 0
{"x":1,"y":0},,,{"x":0,"y":1} {"x":0,"y":1} 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.