简体   繁体   English

如何使用Pig中的HCatlog对Hive Metastore使用压缩技术?

[英]How to use compression techniques for hive metastore using HCatlog in Pig?

I have some pig script which takes input from normal text files using PigStorage(). 我有一些Pig脚本,可以使用PigStorage()从普通文本文件中输入内容。 I want to load and store data from hive metastore, for that I have used HcatLoader() and HcatStorage() from Hcatalog. 我想从hive metastore加载和存储数据,因为我已经使用了Hcatalog的HcatLoader()和HcatStorage()。 Can someone tell me how can i store and load the compress hive data in pig. 有人可以告诉我如何在Pig中存储和加载压缩蜂巢数据。

Pig generally knows how to load compressed data automatically, if it was compressed using gzip or bzip2. Pig通常知道如何使用gzip或bzip2压缩的数据自动加载压缩数据。 For LZO, you will have to have that enabled on your cluster. 对于LZO,您必须在群集上启用该功能。

To store data in compressed form, you can put this in your script: 要以压缩形式存储数据,可以将其放在脚本中:

SET mapred.output.compress true;
SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;

This will cause your output to be compressed using gzip. 这将导致您的输出使用gzip压缩。

Part of the charter of HCatalog is for consumer to be completely unaware of storage concerns (like compression or formats). HCatalog章程的一部分是让消费者完全不了解存储问题(例如压缩或格式)。 If the underlying storage is uncompressed at first then compressed later, you wouldn't have to rewrite your scripts to make sure that you're reading compressed data. 如果基础存储首先是未压缩的,然后再压缩,则不必重写脚本来确保您正在读取压缩的数据。

Having said that... I don't think compression support is implemented in HCatalog yet. 话虽如此……我认为HCatalog中尚未实现压缩支持。 HCatalog Roadmap - Written a long time ago... but has "compression" in envisioned future features. HCatalog路线图 -很久以前写的...但是在预期的将来功能中具有“压缩”作用。

My guess is that you'll have to resort to using the HiveStorage class instead of HCatalog. 我的猜测是,您将不得不使用HiveStorage类而不是HCatalog。

Disclaimer: I could also be completely mistaken about this, but all evidence I've been able to find seems to suggest that compression is not implemented in HCatalog. 免责声明:我对此也可能完全误认为,但是我已经找到的所有证据似乎都表明HCatalog中未实现压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM