简体   繁体   中英

Cannot get InputStream of a File in a HDFS Cluster from a Java App External to the Clusters Network

My goal : to read an InputStream of a file stored in a HDFS Cluster (outside the network of my local machine)

I have a Java App on my local machine and the cluster resides in a different network. I am completely new to Hadoop so I have a couple of questions:

1) How Do I know the IP address and Port that I should be connecting to for the Master Node? I have access to the config files for the Hadoop cluster

2) Should I be understanding this solution as a WebHDFS solution given the Application is outside the Network of the cluster or is the term WebHDFS given to Hadoop apps that simply communicate by Http protocol of the Hadoop file system?

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/data01/hadoop-data/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/data01/hadoop-data/datanode</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>ipc.maximum.data.length</name>
        <value>134217728</value>
    </property>
</configuration>

core-site.xml

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop-master:9000/</value>
    </property>

    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
</configuration>

Connecting From Java Local App to the Cluster in another Network

String testURI = "hdfs://<MASTER_NODE_EXTERNAL_IP>:9000/user/ubuntu/testfolder/fileA.xml";

    Configuration conf = new Configuration();
    conf.set("fs.defaultFS", testURI);
    //conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
    System.setProperty("HADOOP_USER_NAME", "ubuntu");
    System.setProperty("hadoop.home.dir", "/");

    FileSystem fs = FileSystem.get(URI.create(testURI), conf);


    InputStream is = null;

    try{
        is = fs.open(new Path(testURI));
        IOUtils.copyBytes(is, System.out, 4096, false);
    } finally {
        IOUtils.closeStream(is);
    }

I have tried a bunch of different ports for the Master Node but none seem to give any file contents back, all throw Exceptions outside of the cluster.

The same app deployed inside the cluster and specifying the internal IP of the Master Node results in the Master directing the App to the Slave that contains the file I was looking for, and the InputStream is printed perfectly to System.out...

Forgive my ignorance, is there something fundamental I am missing here with HDFS setup? I am almost certain that it has to do with a config change on the cluster before I can connect remotely...

This may be related to network settings and data nodes ports not accessible from outside the cluster (which is usually a good security practice)

You can use WebHDFS to read HDFS file from an external application. This is indeed a REST API (http) documented here https://bighadoop.wordpress.com/2013/06/02/hadoop-rest-api-webhdfs/ and here https://hadoop.apache.org/docs/r1.2.1/webhdfs.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM