简体   繁体   中英

Extract Array from JSON in HiveQL

I have a column which contains a large JSON object. For example, let's call the column Column1, and this is a typical element:

{"key1":value,"key2":[{"subK11":val,"subK12":val},{"subK21":val,"subK22":val}]}

So, I can extract a normal element fine using:

select get_json_object(Column1,'$.key1') as key1

But I have been unable to figure out how to extract the ARRAY in a usable form, as this:

select get_json_object(Column1,'$.key2') as key2 

Returns a STRING type. So I can't select elements from the array like normal. That is, this query will fail:

select key2[1] as first_element
from
(select get_json_object(Column1,'$.key2') as key2)

OR

select explode(key2)
from
(select get_json_object(Column1,'$.key2') as key2 )

Both give errors, the later says "explode() requires array type". So the issue, I think, is that get_json_object returns a string. I need it to recognize that key2 contains an ARRAY, but I have no idea how to do that.

I'm new to Hive SQL, mainly an SQL user, so please let me know if there's anything crazy obvious I'm missing. I have not found a solution to this type of problem on any of the other questions.

you can use hive-third-functions , It provide json_array_extract function, you can extract json array info like this:

json_array_extract("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b"); => ["\"13\"","\"18\"","\"12\""]
json_array_extract_scalar("[{\"a\":{\"b\":\"13\"}}, {\"a\":{\"b\":\"18\"}}, {\"a\":{\"b\":\"12\"}}]", "$.a.b") => ["13","18","12"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM