简体   繁体   中英

ListFiles from HDFS Cluster

I am an amateur with hadoop and stuffs. Now, I am trying to access the hadoop cluster (HDFS) and retrieve the list of files from client eclipse. I can do the following operations after setting up the required configurations on hadoop java client.

I can perform copyFromLocalFile , copyToLocalFile operations accessing HDFS from client. Here's what I am facing. When i give listFiles() method I am getting

org.apache.hadoop.fs.LocatedFileStatus@d0085360
org.apache.hadoop.fs.LocatedFileStatus@b7aa29bf

MainMethod

Properties props = new Properties();
props.setProperty("fs.defaultFS", "hdfs://<IPOFCLUSTER>:8020");
props.setProperty("mapreduce.jobtracker.address", "<IPOFCLUSTER>:8032");
props.setProperty("yarn.resourcemanager.address", "<IPOFCLUSTER>:8032");
props.setProperty("mapreduce.framework.name", "yarn");
FileSystem fs = FileSystem.get(toConfiguration(props)); // Setting up the required configurations
Path p4 = new Path("/user/myusername/inputjson1/");
RemoteIterator<LocatedFileStatus> ritr = fs.listFiles(p4, true);
while(ritr.hasNext())
        {
            System.out.println(ritr.next().toString());
        }

I have also tried FileContext and ended up only getting the filestatus object string or something. Is there a possibility to take the filenames when i iterate to the remote hdfs directory, there is a method called getPath(), Is that the only way we can retrieve the full path of the filenames using the hadoop API or there are any other method so that i can retrieve only name of the files in a specified directory path, Please help me through this, Thanks.

You can indeed use getPath() this will return you a Path object which let you query the name of the file.

Path p = ritr.next().getPath();
// returns the filename or directory name if directory
String name = p.getName();    

The FileStatus object you get can tell you if this is a file or directory.

Here is more API documentation:

http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/Path.html

http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/fs/FileStatus.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM