简体   繁体   中英

hadoop orc table taking only one mapper all the time

In my current project I am working with Orc files with snappy compression format ,What ever query I run it is running with only one mapper .I tried to configure the mapred.max.split.size and mapred.min.split.size,but is not showing any changes in the number of mappers.The reducer count is good enough ,but as the mapper is a single mapper,The time to run a simple query like .

select x,max(y) from z group by x ; is taking almost 20 mins to complete the mapper . Is there any other things I should do to increase the number of mappers.

Please don't tell that to use the partitions or buckets ,As I have used them already in my table.

Try to play with tblproperties orc.stripe.size.

The default value for stripe size is 256 MB and technically there is one mapper per one stripe. With decreasing size of single stripe you can increase number of mappers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM