简体繁体中英

How many mapreduce jobs will run if i query a partitioned table in hive

原文 2016-07-15 15:28:26 2 2 hadoop

This may seems a little silly. But just want to know the exact answer. Suppose I've a table with 2 partitions. If a run a query against one partitioned column how many map jobs will run in the background.

Any help would be greatly appreciated!

Thanks in advance

2 answers

I've read that the # of mappers is determined based on the formula: (size of input divided by block size). Block size for Hadoop 2 is 128 MB.

Therefore I assume you could divide the size of the files in that partition by 128 MB.

So this depends on two things:

By default with non-splittable files, Hadoop will run a Map task for each input file. So if your partition folder has 100 input files, it will run 100 mappers. This would be the default for tab delimited text files for example.
If your files are splittable, it will split based on your blocksize settings. This requires you to use a splittable file format like sequence files.

It's easiest to reason about if you just use simple flat-files. Hope that helps.

How to make hive run mapreduce jobs concurrently?

How many Mapreduce Jobs can be run simultaneously

how I can insert from hive table to partitioned table(as parquet)?

Hive Mapreduce Jobs failing

Hive: How do I INSERT data FROM a PARTITIONED table INTO a PARTITIONED table?

How can I run MapReduce ToolRunner jobs with Oozie?

how to submit many mapreduce jobs in one terminal?

How can I get the MapReduce Jobs source codes generated by the Hive compiler?

How to load hive orc partitioned to Hbase table

How to use insert statement for a Hive partitioned table?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to make hive run mapreduce jobs concurrently? How many Mapreduce Jobs can be run simultaneously how I can insert from hive table to partitioned table(as parquet)? Hive Mapreduce Jobs failing Hive: How do I INSERT data FROM a PARTITIONED table INTO a PARTITIONED table? How can I run MapReduce ToolRunner jobs with Oozie? how to submit many mapreduce jobs in one terminal? How can I get the MapReduce Jobs source codes generated by the Hive compiler? How to load hive orc partitioned to Hbase table How to use insert statement for a Hive partitioned table?

Related Tags

How many mapreduce jobs will run if i query a partitioned table in hive

Question

2 answers

solution1
0 2016-07-15 18:22:16

solution2
0 2016-07-16 19:02:44

How many mapreduce jobs will run if i query a partitioned table in hive

Question

2 answers

solution1 0 2016-07-15 18:22:16

solution2 0 2016-07-16 19:02:44

solution1
0 2016-07-15 18:22:16

solution2
0 2016-07-16 19:02:44