简体   繁体   中英

List folder and files of HDFS using JAVA

I am trying to list all the directory and files in the HDFS using JAVA.

Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
    System.out.println(status.getPath().toString());
}

My code able to generate fs object but got stuck on line number 3, here it try to read the folder and files of files. I am using AWS.

Please help me to resolve the issue.

this is working for me..

public static void main(String[] args) throws IOException, URISyntaxException {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
    FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
    for(FileStatus status : fileStatus){
        System.out.println(status.getPath().toString());
    }
}

output

hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase

it think you are giving incorrect uri. try to do according the code.

if conf is not set then you have to add resource file

conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));

Check the following method that get list of files using either recursive or non-recursive approach. For getting list of directories you can change the code in such a way that it will add directory paths to resulting list rather than files. Please check fs.isDirectory() if-else clauses in the code for extracting paths of directories. FileStatus class also has isDirectory( ) method to check whether the FileStatus instance refers to a directory.

    //helper method to get the list of files from the HDFS path
    public static List<String> 
        listFilesFromHDFSPath(Configuration hadoopConfiguration,
                              String hdfsPath,
                              boolean recursive) throws IOException, 
                                            IllegalArgumentException
    {
        //resulting list of files
        List<String> filePaths = new ArrayList<String>();

        //get path from string and then the filesystem
        Path path = new Path(hdfsPath);  //throws IllegalArgumentException
        FileSystem fs = path.getFileSystem(hadoopConfiguration);

        //if recursive approach is requested
        if(recursive)
        {
            //(heap issues with recursive approach) => using a queue
            Queue<Path> fileQueue = new LinkedList<Path>();

            //add the obtained path to the queue
            fileQueue.add(path);

            //while the fileQueue is not empty
            while (!fileQueue.isEmpty())
            {
                //get the file path from queue
                Path filePath = fileQueue.remove();

                //filePath refers to a file
                if (fs.isFile(filePath))
                {
                    filePaths.add(filePath.toString());
                }
                else   //else filePath refers to a directory
                {
                    //list paths in the directory and add to the queue
                    FileStatus[] fileStatuses = fs.listStatus(filePath);
                    for (FileStatus fileStatus : fileStatuses)
                    {
                        fileQueue.add(fileStatus.getPath());
                    } // for
                } // else

            } // while

        } // if
        else        //non-recursive approach => no heap overhead
        {
            //if the given hdfsPath is actually directory
            if(fs.isDirectory(path))
            {
                FileStatus[] fileStatuses = fs.listStatus(path);

                //loop all file statuses
                for(FileStatus fileStatus : fileStatuses)
                {
                    //if the given status is a file, then update the resulting list
                    if(fileStatus.isFile())
                        filePaths.add(fileStatus.getPath().toString());
                } // for
            } // if
            else        //it is a file then
            {
                //return the one and only file path to the resulting list
                filePaths.add(path.toString());
            } // else

        } // else

        //close filesystem; no more operations
        fs.close();

        //return the resulting list
        return filePaths;
    } // listFilesFromHDFSPath

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM