简体   繁体   中英

How to change number of mapper with ORC files using tez?

I am trying to increase the number of map task. The file format is ORC and using TEZ for processing.

I am having a 2.8 gb files. Approximately 128 MB files and number of files is 29 approx.

Every time I execute 28 map task gets executed. I am trying to increase the map task count.

Thanks in advance

Check these settings (see comments below):

set hive.tez.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

set tez.grouping.min-size=16777216; -- files with smaller size will be combined if possible
set tez.grouping.max-size=67108864; -- (default is 1 Gb), files with bigger size will be splitted and more mappers started

Also you can control the number of mappers using this setting:

set mapreduce.job.maps=128; --better use grouping splits configuration (above) instead of this one because it is more flexible

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM