Trouble in Hadoop single pseudo-distributed node cluster

Question

I'm trying to setup hadoop server in pseudo-distribuited, to allow map/reduce tasks to be executed in parallel. Right now, when i run a job, the console output the following line:

Running job: job_local1508664063_0001

It means that i'm in local mode, and so it's normal that all tasks are sequenced. This is my current config, what i have to edit to let hadoop run parallel maps task / reduce tasks ? (I run the hadoop server using start-dfs and start-yarn)

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>mymachine:54311</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>

    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>mymachine:50030</value>
        <description>The host and port that the MapReduce job tracker runs
        at.  If "local", then jobs are run in-process as a single map
        and reduce task.
        </description>
    </property>

</configuration>

mymachine is the account name of the server. i have also tried with the ip getting the same results, the job manager still consider the server as "local". Current job create 12 map task, and these are run sequentially.

As reported in this thread:

stackoverflow.com/questions/26267476/why-my-map-reduce-job-is-running-sequentially

PS: to be sure that configs are loaded, in my java webservice i do a redundant set with:

conf.set("mapreduce.jobtracker.address", "mymachine:54311");
conf.set("mapreduce.jobtracker.http.address", "mymachine:50030");

And i also set resources to allow multiple contaniers ==> parallel map tasks

(i7 4/8, 8gb ram)

conf.set("yarn.nodemanager.resource.memory-mb", "6144");
conf.set("yarn.nodemanager.resource.cpu-vcores", "8");
conf.set("yarn.scheduler.minimum-allocation-mb", "1024");

How should i modify my config? My hadoop version is 2.7.1

Answer 1

In hadoop 2.x there is no jobtracker and tasktrakers. That's from hadoop 1.x.

I'm maintaining a script on github that sets up hadoop from scratch. You could find it useful. It contains minimal Hadoop configuration to get started.

https://github.com/hadoopfromscratch/hadoopfromscratch/

Answer 2

You can use a free and opensource Apache Ambari to install , configure and manage a full hadoop cluster , either single node or multinode , with all the configuration from web UI or storing your config templates in version control.

DEPLOYING, MANAGING AND CONFIGURING HDP WITH AMBARI

Trouble in Hadoop single pseudo-distributed node cluster

Question

2 answers

solution1
0 2017-01-09 16:00:22

solution2
0 2017-01-09 16:10:31

Trouble in Hadoop single pseudo-distributed node cluster

Question

2 answers

solution1 0 2017-01-09 16:00:22

solution2 0 2017-01-09 16:10:31

solution1
0 2017-01-09 16:00:22

solution2
0 2017-01-09 16:10:31