简体   繁体   中英

How can I validate data stored in Hadoop?

Is there any framework or library with which I can validate tuples? These validation shall test the types, length, nullability etc. against configured validation rules. Based on the validation result this shall generate validation file indexing the tuple which failed and with details message of why it failed.

Jumbune's data validation module would let you do this. It can check HDFS data for regular expressions, null and data type violations

Just deploy jumbune on a user machine, run a small jar on the NameNode, start up jumbune and provide the details on the HDFS validation tab,

Details such as tuple separator, field separator, number and type of validation to be performed. The result would contain the total number of violations, file name and the line number with the exact detail of the violation.

I guess this module is tailor made for your needs :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM