简体   繁体   中英

How to move HBase tables to HDFS in Parquet format?

I have to build a tool which will process our data storage from HBase(HFiles) to HDFS in parquet format.

Please suggest one of the best way to move data from HBase tables to Parquet tables.

We have to move 400 million records from HBase to Parquet. How to achieve this and what is the fastest way to move data?

Thanks in advance.

Regards,

Pardeep Sharma.

Please have a look in to this project tmalaska/HBase-ToHDFS which reads a HBase table and writes the out as Text, Seq, Avro, or Parquet

Example usage for parquet :

Exports the data to Parquet

hadoop jar HBaseToHDFS.jar ExportHBaseTableToParquet exportTest c export.parquet false avro.schema

I recently opensourced a patch to HBase which tackles the problem you are describing. Have a look here: https://github.com/ibm-research-ireland/hbaquet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM