简体   繁体   English

hadoop兽人表一直只使用一个映射器

[英]hadoop orc table taking only one mapper all the time

In my current project I am working with Orc files with snappy compression format ,What ever query I run it is running with only one mapper .I tried to configure the mapred.max.split.size and mapred.min.split.size,but is not showing any changes in the number of mappers.The reducer count is good enough ,but as the mapper is a single mapper,The time to run a simple query like . 在我当前的项目中,我正在使用具有快速压缩格式的Orc文件,无论运行什么查询,它都仅使用一个mapper运行。我试图配置mapred.max.split.size和mapred.min.split.size,但是并没有显示映射器数量的任何变化。reduce数量足够好,但是由于映射器是单个映射器,因此可以运行一个简单的查询,例如。

select x,max(y) from z group by x ; 从x的z组中选择x,max(y); is taking almost 20 mins to complete the mapper . 完成映射器大约需要20分钟。 Is there any other things I should do to increase the number of mappers. 还有其他我应该做的事情来增加映射器的数量。

Please don't tell that to use the partitions or buckets ,As I have used them already in my table. 请不要告诉我要使用分区或存储桶,因为我已经在表中使用了它们。

Try to play with tblproperties orc.stripe.size. 尝试使用tblproperties orc.stripe.size。

The default value for stripe size is 256 MB and technically there is one mapper per one stripe. 条带大小的默认值为256 MB,从技术上讲,每条带有一个映射器。 With decreasing size of single stripe you can increase number of mappers. 随着单个条带大小的减少,您可以增加映射器的数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM