简体   繁体   中英

Access to datanode running on macOS from a docker container can't be established

I'm not experienced with HDFS and I've run into a problem related to HDFS running on my macbook. I have a HDFS client which is launched in a docker container, and every time I try to put or get data to/from HDFS from this container I get the following error:

hdfs dfs -put /core-site.xml hdfs://host.docker.internal:9000/abcs
21/03/02 07:28:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/03/02 07:28:48 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1610)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
21/03/02 07:28:48 INFO hdfs.DFSClient: Abandoning BP-1485605719-127.0.0.1-1614607405999:blk_1073741832_1008
21/03/02 07:28:48 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:9866
21/03/02 07:28:48 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /abcs/core-site.xml._COPYING_ could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

It can be clearly seen that my client(container) receives the wrong IP address of the DataNode (127.0.0.1:9866), it should be something like 192.168.65.2:9866 ie host.docker.internal. or domain name of my laptop (ex. my-laptop)

My core-site.xml: (of course my-laptop is binded to 127.0.0.1 in etc/hosts)

<configuration>
    <property>         
        <name>fs.defaultFS</name>         
        <value>hdfs://my-laptop:9000</value>     
    </property>
    <property>
          <name>hadoop.tmp.dir</name>
          <value>/Users/Ian_Rakhmatullin/localHadoopTmp</value>
  </property>
</configuration>

hdfs-site.xml:

<configuration>
    <property>         
        <name>dfs.replication</name>        
         <value>1</value>    
     </property>
     <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
     <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.datanode.address</name>
        <value>my-laptop:9866</value>
    </property>
    <property>
        <name>dfs.datanode.http.address</name>
        <value>my-laptop:9864</value>
    </property>
    <property>
        <name>dfs.datanode.ipc.address</name>
        <value>my-laptop:9867</value>
    </property>
</configuration>

One more thing that confuses me is that through HDFS webUI I can see that DataNode is running on localhost:9866 (127.0.0.1:9866) , but I expect "my-laptop:9866" as well.

Does anyone have any thoughts how to resolve this issue? Thank you.

Seems like I've solved this problem, by following these steps:

  1. Add dfs.datanode.hostname property in your hdfs

hdfs-site xml:

    <property>         
        <name>dfs.replication</name>        
         <value>1</value>    
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>true</value>
    </property> 
    <property>
        <name>dfs.datanode.hostname</name>
        <value>my-laptop</value>
    </property> 

core-site xml the same as it is in my question.

  1. Add dfs.client.use.datanode.hostname to your hdfs-site.xml for a hdfs client;
    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
  1. Map DNS name (my-laptop in my case) to the IP address of your docker host (host.docker.internal in my case -> 192.168.65.2) in container's etc/hosts
192.168.65.2 my-laptop

With this approach Namenode will return host name for your Datanode to the hdfs client, and then, the client will use your mapping to host.docker.internal. And this is what I needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM