简体繁体中英

To speed up hive process, how to adjust mapper and reducer number using tez

原文 2018-08-25 03:15:59 4 1 hadoop/ hive/ apache-tez

I tried the process(word labeling of sentence) of large data(about 150GB) using tez , but the problem is that it took so much time(1week or more),then

I tried to specify number of mapper. Though I set mapred.map.tasks =2000, but I can't stop mapper being set to about 150, so I can't do what I want to do.

I specify the map value in oozie workflow file and use the tez.

How can I specify the number of mapper?

Finally I want to speed up the process, it is ok not to use tez.

In addition, I would like to count labeled sentence by reducer, it takes so much time,too.

And , I also want to know how I adjust memory size to use each mapper and reducer process.

1 answers

In order to manually set the number of mappers in a Hive query when TEZ is the execution engine the configuration tez.grouping.split-count can be used...

... set tez.grouping.split-count=4 will create 4 mappers

https://community.pivotal.io/s/article/How-to-manually-set-the-number-of-mappers-in-a-TEZ-Hive-job

However, overall, you should optimize the storage format and the Hive partitions before you even begin tuning the Tez settings . Do not try and process data STORED AS TEXT in Hive. Convert it to ORC or Parquet first.

If Tez isn't working out for you, you can always try Spark. Plus labelling sentences is probably a Spark MLlib worlflow you can find somewhere

How to change number of mapper with ORC files using tez?

hive-on-tez mapper stuck in INITIALIZING with total number of containers being -1 when accessing data on S3/MinIO

Understanding the mapper and reducer in a HIVE database

using a reducer slows the mapper

Hive on TEZ query taking forever at Reducer cross product

how to speed up sort in hive

number of mapper and reducer tasks in MapReduce

How to set number of reducer dynamically based on my mapper output size?

Unable to start hive using tez execution engine

How many mapper is used in hive to process Table of 1GB

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to change number of mapper with ORC files using tez? hive-on-tez mapper stuck in INITIALIZING with total number of containers being -1 when accessing data on S3/MinIO Understanding the mapper and reducer in a HIVE database using a reducer slows the mapper Hive on TEZ query taking forever at Reducer cross product how to speed up sort in hive number of mapper and reducer tasks in MapReduce How to set number of reducer dynamically based on my mapper output size? Unable to start hive using tez execution engine How many mapper is used in hive to process Table of 1GB

Related Tags

To speed up hive process, how to adjust mapper and reducer number using tez

Question

1 answers

solution1 1 2018-08-25 03:59:58

solution1
1 2018-08-25 03:59:58