简体   繁体   中英

Dataproc HDFS file URIs

I have a question how to get path/url to the file located in dataproc hdfs? I want to run a M/R job based on a file that located in dataproc hdfs.

The followings are all valid HDFS URIs in a Dataproc cluster:

  1. hdfs://<master-hostname>:8020/<path-to-file>
  2. hdfs://<master-hostname>/<path-to-file>
  3. hdfs:///<path-to-file>

The 3rd one works, because by default in every node of a Dataproc cluster, the fs.defaultFS property is configured as hdfs://<master-hostname> in /etc/hadoop/conf/core-site.xml . And 8020 is the default NameNode port.

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://<master-hostname></value>
    <description>
      The name of the default file system. A URI whose scheme and authority
      determine the FileSystem implementation. The uri's scheme determines
      the config property (fs.SCHEME.impl) naming the FileSystem
      implementation class. The uri's authority is used to determine the
      host, port, etc. for a filesystem.
    </description>
  </property>

You can run hadoop fs -ls <uri> on any node to list the files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM