Ubuntu上的Hadoop多节点集群安装问题-故障排除

Question

I have three Ubuntu 12.04 LTS computers that I want to install Hadoop on in a Master/Slave configuration as described here . 我有三台Ubuntu 12.04 LTS计算机，我要按此处所述在主/从配置中安装Hadoop。 It says to first install Hadoop as a single node and then proceed to multi-node. 它说首先将Hadoop作为单个节点安装，然后再进行多节点安装。 The single node installation works perfectly fine. 单节点安装工作正常。 I made the required changes to the /etc/hosts file and configured everything just as the guide says, but when I start the Hadoop cluster on the master, I get an error. 我按照指南所述对/ etc / hosts文件进行了必要的更改并配置了所有内容，但是当我在主服务器上启动Hadoop集群时，出现错误。

My computers, aptly named ironman, superman and batman, with batman (who else?) being the master node. 我的计算机恰当地命名为ironman，superman和batman，而batman（还有谁？）是主节点。 When I do sudo bin/start-dfs.sh , the following shows up. 当我执行sudo bin/start-dfs.sh ，将显示以下内容。

在此处输入图片说明

When I enter the password, I get this: 输入密码后，我得到以下信息：

在此处输入图片说明

When I try sudo bin/start-all.sh , I get this: 当我尝试sudo bin/start-all.sh ，我得到了：

在此处输入图片说明

I can ssh into the different terminals, but there's something that's not quite right. 我可以进入不同的终端，但是有些不完全正确。 I checked the logs on superman/slave terminal and it says that it can't connect to batman:54310 and some zzz message. 我检查了超人/从属终端上的日志，它说它无法连接到batman：54310和一些zzz消息。 I figured my /etc/hosts is wrong but in fact, it is: 我发现我的/ etc / hosts是错误的，但实际上，它是：

在此处输入图片说明

I tried to open port 54310 by changing iptables, but the output screens shown here are after I made the changes. 我尝试通过更改iptables打开端口54310，但是此处显示的输出屏幕是在进行更改之后。 I'm at my wit's end. 我机智的尽头。 Please tell me where I'm going wrong. 请告诉我我要去哪里了。 Please do let me know if you need any more information and I will update the question accordingly. 如果您需要更多信息，请告诉我，我将相应地更新问题。 Thanks! 谢谢！

UPDATE: Here are my conf files. 更新：这是我的配置文件。

core-site.xml Please note that I had put batman:54310 instead of the IP address. core-site.xml请注意，我输入了batman：54310而不是IP地址。 I only changed it because I thought I'd make the binding more explicit. 我只是更改了它，因为我认为我将使绑定更加明确。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://130.65.153.195:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

</configuration>

mapred-site.xml mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>130.65.153.195:54311</value>
<description>The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

</configuration>

hdfs-site.xml hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

</configuration>

My conf/masters file is simply batman and my conf/slaves file is just: 我的conf / masters文件就是batman而我的conf / slaves文件就是：

batman
superman
ironman

Hope this clarifies things. 希望这可以澄清事情。

Answer 1

First things first: Make sure you can ping the master from slave and slave from master. 首先，要确保从ping从主机和从主机ping从主机。 Login to each machine individually and ping the other 2 hosts. 分别登录到每台计算机并ping其他2台主机。 Make sure they are reachable via their hostnames. 确保通过主机名可以访问它们。 It is possible that you have not add /etc/hosts entries in the slaves. 您可能没有在从站中添加/ etc / hosts条目。

Secondly, you need to setup passwordless SSH access. 其次，您需要设置无密码SSH访问。 You can use ssh-keygen -t rsa and ssh-copy-id for this. 您可以为此使用ssh-keygen -t rsa和ssh-copy-id 。 This will help remove the password prompts. 这将有助于删除密码提示。 It is a good idea to create a separate user for this (and not use root). 最好为此创建一个单独的用户（而不使用root）。

If this doesn't help, please post your log output. 如果这样做没有帮助，请发布您的日志输出。

Ubuntu上的Hadoop多节点集群安装问题-故障排除

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-10-13 07:33:36

Ubuntu上的Hadoop多节点集群安装问题-故障排除

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-10-13 07:33:36

解决方案1
1 已采纳 2013-10-13 07:33:36