简体   繁体   中英

Hadoop 3.2.0 doesn't works in cluster (VirtualBox)

I'm trying to setup for test a VB Hadoop cluster with 1 namenode and 2 datanodes. I followed several tutorials but when I ran in the namenode the start-dfs.sh it start only the namenode processes and not the datanodes.

I'm able to start each one individually but seems that are not working in cluster.

Basically I setup 1 server (debian 9) configured a static IP for each VM

hadoop@namenode:~$ cat /etc/hosts
127.0.0.1   localhost namenode
192.168.10.100 namenode.com
192.168.10.161 datanode1.com
192.168.10.162 datanode2.com
hadoop@namenode:~$ cat hadoop/etc/hadoop/slaves
datanode1.com
datanode2.com
hadoop@namenode:~$ cat hadoop/etc/hadoop/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://namenode.com:9000</value>
        </property>
</configuration>
hadoop@namenode:~$ cat hadoop/etc/hadoop/slaves
datanode1.com
datanode2.com
hadoop@namenode:~$ cat hadoop/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/hadoop/data/nameNode</value>
    </property>
    <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/hadoop/data/dataNode</value>
    </property>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
</configuration>

copied all the configs in all the VMs, entered to the namenode and formatted with hdfs namenode -format

If I check the clusterId is consistent in all the servers

hadoop@namenode:~$ cat data/dataNode/current/VERSION
#Sat Mar 09 07:58:36 EST 2019
storageID=DS-cc3b3c25-46c8-467c-8a7b-2311f82e9790
clusterID=CID-b0b63b58-73bd-4e6b-85cd-31c353052db6
cTime=0
datanodeUuid=d9a14382-7694-476c-864b-9164de01a92e
storageType=DATA_NODE
layoutVersion=-57
hadoop@namenode:~$ cat data/nameNode/current/VERSION
#Sat Mar 09 07:55:26 EST 2019
namespaceID=1109263708
clusterID=CID-b0b63b58-73bd-4e6b-85cd-31c353052db6
cTime=1551735568343
storageType=NAME_NODE
blockpoolID=BP-1318860827-127.0.0.1-1551735568343
layoutVersion=-65

I don't seeing anything too weird in logs rather than

hadoop@namenode:~$ cat hadoop/logs/* | grep ERROR
2019-03-04 17:40:24,433 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,441 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
2019-03-09 07:57:10,818 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,397 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,417 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 1: SIGHUP
2019-03-09 07:57:09,420 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:29:25,258 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,434 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,441 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 1: SIGHUP
2019-03-04 17:40:24,420 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2019-03-04 17:40:24,430 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 1: SIGHUP
2019-03-04 17:40:24,593 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2019-03-04 17:40:24,791 ERROR org.apache.hadoop.yarn.event.EventDispatcher: Returning, interrupted : java.lang.InterruptedException
2019-03-04 17:40:24,797 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2019-03-04 17:40:24,406 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: RECEIVED SIGNAL 15: SIGTERM
cat: hadoop/logs/userlogs: Is a directory
2019-03-04 17:40:24,418 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: RECEIVED SIGNAL 1: SIGHUP
2019-03-09 07:57:14,149 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: RECEIVED SIGNAL 15: SIGTERM

I already tried to deleted the data folders and reformat, but still not working

any Idea?

after a few days working on this I realize that the issue was: - following tutorial be sure that the core-site xml has the property fs.defaultFS and not fs.default.name - secondly I always was adding the datanodes to the /etc/hadoop/slaves but I was missing the /etc/hadoop/workers file

after add there, I reformat and start again the cluster and it works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM