簡體   English   中英

使用JAVA列出HDFS的文件夾和文件

[英]List folder and files of HDFS using JAVA

我試圖使用JAVA列出HDFS中的所有目錄和文件。

Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://ip address"), configuration);
FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://ip address/user/uname/"));
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(FileStatus status : fileStatus){
    System.out.println(status.getPath().toString());
}

我的代碼能夠生成fs對象,但卡在第3行,在這里它嘗試讀取文件的文件夾和文件。 我正在使用AWS。

請幫我解決這個問題。

這對我有用..

public static void main(String[] args) throws IOException, URISyntaxException {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(new URI("hdfs://localhost:9000/"), conf);
    FileStatus[] fileStatus = fs.listStatus(new Path("hdfs://localhost:9000/"));
    for(FileStatus status : fileStatus){
        System.out.println(status.getPath().toString());
    }
}

產量

hdfs://localhost:9000/All.txt
hdfs://localhost:9000/department.txt
hdfs://localhost:9000/emp.tsv
hdfs://localhost:9000/employee.txt
hdfs://localhost:9000/hbase

它認為你給的是不正確的uri。 嘗試按照代碼做。

如果未設置conf,則必須添加資源文件

conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/home/kishore/BigData/hadoop/etc/hadoop/hdfs-site.xml"));

檢查以下方法,使用遞歸或非遞歸方法獲取文件列表。 要獲取目錄列表,您可以更改代碼,以便將目錄路徑添加到結果列表而不是文件。 請檢查fs.isDirectory() if-else子句以提取目錄路徑。 FileStatus類還有isDirectory( )方法來檢查FileStatus實例是否引用了一個目錄。

    //helper method to get the list of files from the HDFS path
    public static List<String> 
        listFilesFromHDFSPath(Configuration hadoopConfiguration,
                              String hdfsPath,
                              boolean recursive) throws IOException, 
                                            IllegalArgumentException
    {
        //resulting list of files
        List<String> filePaths = new ArrayList<String>();

        //get path from string and then the filesystem
        Path path = new Path(hdfsPath);  //throws IllegalArgumentException
        FileSystem fs = path.getFileSystem(hadoopConfiguration);

        //if recursive approach is requested
        if(recursive)
        {
            //(heap issues with recursive approach) => using a queue
            Queue<Path> fileQueue = new LinkedList<Path>();

            //add the obtained path to the queue
            fileQueue.add(path);

            //while the fileQueue is not empty
            while (!fileQueue.isEmpty())
            {
                //get the file path from queue
                Path filePath = fileQueue.remove();

                //filePath refers to a file
                if (fs.isFile(filePath))
                {
                    filePaths.add(filePath.toString());
                }
                else   //else filePath refers to a directory
                {
                    //list paths in the directory and add to the queue
                    FileStatus[] fileStatuses = fs.listStatus(filePath);
                    for (FileStatus fileStatus : fileStatuses)
                    {
                        fileQueue.add(fileStatus.getPath());
                    } // for
                } // else

            } // while

        } // if
        else        //non-recursive approach => no heap overhead
        {
            //if the given hdfsPath is actually directory
            if(fs.isDirectory(path))
            {
                FileStatus[] fileStatuses = fs.listStatus(path);

                //loop all file statuses
                for(FileStatus fileStatus : fileStatuses)
                {
                    //if the given status is a file, then update the resulting list
                    if(fileStatus.isFile())
                        filePaths.add(fileStatus.getPath().toString());
                } // for
            } // if
            else        //it is a file then
            {
                //return the one and only file path to the resulting list
                filePaths.add(path.toString());
            } // else

        } // else

        //close filesystem; no more operations
        fs.close();

        //return the resulting list
        return filePaths;
    } // listFilesFromHDFSPath

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM