简体   繁体   中英

Hive out-of-the-box json parser

I have a text file containing json records I would like to load to Hive. My json looks like:

{"vr":1,"tm":1312816191516,"tms":"08-08-2011 15:09:51.516 GMT","as":1002,"pb":1102,"cts":[1204,1205],"ctgs":[1304,1305],"op":1400,"ev":2,"dv":1503,"dvgs":[1605,1606],"cnt":"cnt5","usr":"usr8","atts":[{"id":8002,"val":"ccc"},{"id":8003,"val":"ddd"}],"sel":{"cm":2102,"ty":"PRE","ag":3002,"ad":4002,"fl":5002,"fla":6002,"hg":7002,"mc":"WAP","pr":0.1}}

As you can see I have a nested json with arrays of primitives and array of objects.

Is it possible to load it as is to Hive using any built in function?

Yosi

You can use a custom serde to read json files to hive tables. See the following serde on github - https://github.com/rcongiu/Hive-JSON-Serde

Also checkout the brickhouse - https://github.com/klout/brickhouse . They have quite decent UDF's for json (like json_split and json_map). With brickhouse and get_json_object / json_tuple (also mentioned by Nija here) you can even avoid using custom SerDe's, like Hive-JSON-Serde.

You should be able to load it into Hive as is. It's possible you may need to escape the " s. I haven't loaded JSON into hive, so not 100% if any escaping needs to be done.

To access the JSON elements once it is in hive; Hive has a built in function for doinh so. get_json_object , which can be seen in details at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-getjsonobject

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM