[英]Is it possible to compress Parquet file which contain Json data in hive external table?
I want to know how to compress Parquet file which contain Json data in hive external table. 我想知道如何压缩配置单元外部表中包含Json数据的Parquet文件。 How can it be done?
如何做呢?
I have created external table like this: 我已经创建了这样的外部表:
create table parquet_table_name3(id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' LOCATION '/user/cloudera/parquet2';
and I had set the compression properties 我已经设置了压缩属性
set parquet.compression=GZIP;
and compressed my input Parquet file by executing 并通过执行压缩我的输入Parquet文件
GZIP <file name> ( i.e 000000_0.Parquet)
after that i have load compresed GZIP file into hdfs location /user/cloudera/parquet2
之后,我将压缩的GZIP文件加载到hdfs位置
/user/cloudera/parquet2
next i have try to run the run the below query 接下来,我尝试运行以下查询
select * from parquet_table_name3;
i am getting bellow result 我得到波纹管结果
NULL NULL NULL NULL
NULL NULL NULL NULL
Can you please let me know why i am getting null value instead of result, how to do parquet file compression(if it contain json data) in hive external table ? 您能否让我知道为什么我得到空值而不是结果,如何在蜂巢外部表中进行实木复合地板文件压缩(如果它包含json数据)? Can someone help me to compress in hive external table?
有人可以帮我在蜂巢外部表中压缩吗?
Duh! h! You can't compress an existing Parquet file "from outside".
您不能“从外部”压缩现有的Parquet文件。 It's a columnar format with a hellishly complicated internal structure, just like ORC;
就像ORC一样,它是一种柱状格式,内部结构非常复杂。 the file "skeleton" requires fast random access (ie no compression), and each data chunk has to be compressed separately because they are accessed separately.
文件“骨架”需要快速的随机访问(即无压缩),并且每个数据块都必须分别压缩,因为它们是分别访问的。
It's when you create a new Parquet file that you request the SerDe library to compress data inside the file, based on the parquet.compression
Hive property. 当您创建要求SERDE库压缩文件内数据的新文件的实木复合地板的基础上,这是
parquet.compression
蜂巢财产。
At read time, the SerDe then checks the compression codec of each data file and decompresses accordingly. 在读取时,SerDe然后检查每个数据文件的压缩编解码器并进行相应的解压缩。
A quick Google search returns a couple of must-reads such as this and that . Google快速搜索会返回一些必读的内容,例如this和that 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.