简体   繁体   English

将Apache Pig连接到Hadoop集群

[英]Connect Apache Pig To Hadoop Cluster

I'm using Apache Pig to do some data analysis work with a Hadoop cluster. 我正在使用Apache Pig对Hadoop集群进行一些数据分析。 I deployed ONE muster node and 32 slave nodes in the hadoop cluster. 我在hadoop集群中部署了一个集合节点和32个从属节点。 However, when I use Pig to run scripts in mapreduce mode, connecting to that Hadoop cluster, it always initiate only one map and one reduce. 但是,当我使用Pig在mapreduce模式下运行脚本,并连接到该Hadoop集群时,它始终仅启动一个map,然后启动一个reduce。 How can I set up Pig or Hadoop to make use of all 32 slaves? 如何设置Pig或Hadoop以利用所有32个从属?

Job status is shown below: 工作状态如下所示:

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime  MinMapTime  AvgMapTime  MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime    Alias   Feature Outputs
job_1457865367374_0001  1   1   88  88  88  88  27  27  27  27  1-1,access_grouped,access_summed,cleaned,named,raw,timed,timed_grouped,timed_summedMULTI_QUERY  
job_1457865367374_0002  1   1   5   5   5   5   5   5   5   5   access_ordered  SAMPLER 
job_1457865367374_0003  2   1   10  10  10  10  6   6   6   6   density,density_scored  HASH_JOIN   
job_1457865367374_0004  1   1   5   5   5   5   5   5   5   5   timed_ordered   SAMPLER 
job_1457865367374_0005  1   1   5   5   5   5   5   5   5   5   timed_ordered   ORDER_BY    hdfs://master:54310/user/ubuntu/Data/timed_ordered,
job_1457865367374_0006  1   1   5   5   5   5   5   5   5   5   access_ordered  ORDER_BY    hdfs://master:54310/user/ubuntu/Data/access_ordered,
job_1457865367374_0007  1   1   5   5   5   5   5   5   5   5   density_ordered SAMPLER 
job_1457865367374_0008  1   1   5   5   5   5   5   5   5   5   density_ordered ORDER_BY    hdfs://master:54310/user/ubuntu/Data/density_ordered,

btw, I installed Apache Pig on the master machine. 顺便说一句,我在主计算机上安装了Apache Pig。

SET default_parallel xyz 设置default_parallel xyz

In pig above command gives power to set no. 在猪上面命令给力设置否。 of parallel tasks.but hadoop Framework detects no. 并行任务。但是hadoop Framework未检测到。 of mapper(based on no. of input splits) and reducers(can be set on custer level or application level.). mapper(基于输入拆分的数量)和reducer(可以在custer级别或应用程序级别上设置)的组合。 You cannot set no. 你不能设置不。 of mappers for your appication but reducers, you can. 映射器用于您的应用,但可以使用简化器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM