简体   繁体   English

如何让hive同时运行mapreduce作业?

[英]How to make hive run mapreduce jobs concurrently?

I'm new to hive and I have encountered a problem, 我是新手,我遇到了一个问题,

I have a table in hive like this: 我有一个像这样的蜂巢表:

create table td(id int, time string, ip string, v1 bigint, v2 int, v3 int,
v4 int, v5 bigint, v6 int)  PARTITIONED BY(dt STRING)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','  lines TERMINATED BY '\n' ;  

And I run an sql like: 我运行一个像:

from td
INSERT OVERWRITE  DIRECTORY '/tmp/total.out' select count(v1)
INSERT OVERWRITE  DIRECTORY '/tmp/totaldistinct.out' select count(distinct v1)
INSERT OVERWRITE  DIRECTORY '/tmp/distinctuin.out' select distinct v1

INSERT OVERWRITE  DIRECTORY '/tmp/v4.out' select v4 , count(v1), count(distinct v1) group by v4
INSERT OVERWRITE  DIRECTORY '/tmp/v3v4.out' select v3, v4 , count(v1), count(distinct v1) group by v3, v4

INSERT OVERWRITE  DIRECTORY '/tmp/v426.out' select count(v1), count(distinct v1)  where v4=2 or v4=6
INSERT OVERWRITE  DIRECTORY '/tmp/v3v426.out' select v3, count(v1), count(distinct v1) where v4=2 or v4=6 group by v3

INSERT OVERWRITE  DIRECTORY '/tmp/v415.out' select count(v1), count(distinct v1)  where v4=1 or v4=5
INSERT OVERWRITE  DIRECTORY '/tmp/v3v415.out' select v3, count(v1), count(distinct v1) where v4=1 or v4=5 group by v3

it works, and the output result is what I want. 它工作,输出结果是我想要的。

but there is one problem, hive generate 9 mapreduce jobs and run these jobs one by one. 但是有一个问题,hive会生成9个mapreduce作业并逐个运行这些作业。

I run explain on this query, and I got the following message: 我对这个查询运行解释,我收到以下消息:

STAGE DEPENDENCIES:
  Stage-9 is a root stage
  Stage-0 depends on stages: Stage-9
  Stage-10 depends on stages: Stage-9
  Stage-1 depends on stages: Stage-10
  Stage-11 depends on stages: Stage-9
  Stage-2 depends on stages: Stage-11
  Stage-12 depends on stages: Stage-9
  Stage-3 depends on stages: Stage-12
  Stage-13 depends on stages: Stage-9
  Stage-4 depends on stages: Stage-13
  Stage-14 depends on stages: Stage-9
  Stage-5 depends on stages: Stage-14
  Stage-15 depends on stages: Stage-9
  Stage-6 depends on stages: Stage-15
  Stage-16 depends on stages: Stage-9
  Stage-7 depends on stages: Stage-16
  Stage-17 depends on stages: Stage-9
  Stage-8 depends on stages: Stage-17

it seems that stage 9-17 is corresponding to mapreduce job 0-8 似乎第9-17阶段对应于mapreduce作业0-8
but from the explain message above, stage 10-17 only depends on stage 9, 但是从上面的解释信息来看,第10-17阶段仅取决于第9阶段,
so I have an question, why job 1-8 can't run concurrently? 所以我有一个问题,为什么工作1-8不能同时运行?

Or how can I make job 1-8 run concurrently? 或者我如何让作业1-8同时运行?

Thank you very much for your help! 非常感谢您的帮助!

In hive-default.xml, there is a property named "hive.exec.parallel" which could enable execute job in parallel. 在hive-default.xml中,有一个名为“hive.exec.parallel”的属性,它可以并行执行作业。 The default value is "false". 默认值为“false”。 You can change it to "true" to acquire this ability. 您可以将其更改为“true”以获得此功能。 You can use another property "hive.exec.parallel.thread.number" to control how many jobs at most can be executed in parallel. 您可以使用另一个属性“hive.exec.parallel.thread.number”来控制最多可以并行执行的作业数。

For more details: https://issues.apache.org/jira/browse/HIVE-549 有关详细信息,请访问: https//issues.apache.org/jira/browse/HIVE-549

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM