简体   繁体   中英

hadoop - Validate json data loaded into hive warehouse

I have json files, volume is approx 500 TB. I have loaded complete set into hive data warehouse.

How would I validate or test the data that was loaded into hive warehouse. What should be my testing strategy ?

Client want us to validate the json data. Whether the data loaded into hive is correct ot not. Is there any miss? If yes, which field it was?

Please help.

How is your data being stored in hive tables ?

One option is create a Hive UDF function that receive the JSON string and validate the data and return another string with the error message or an empty string if the JSON string is well formed.

Here is a Hve UDF tutorial: http://blog.matthewrathbone.com/2013/08/10/guide-to-writing-hive-udfs.html

With the Hive UDF function in place you can executequeries like:

select strjson, validateJson(strjson) from jsonTable where validateJson(strjson) != "";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM