使用 Java 以編程方式讀取存儲在 HDFS 中的文本文件的內容

Question

如何運行這個簡單的 Java 程序從存儲在 HDFS 目錄/字中的文本文件中讀取字節？ 我需要為此創建一個 jar 文件嗎？

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import org.apache.hadoop.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class filesystemhdfs 
{
    public static void main(String args[]) throws MalformedURLException, IOException
    {
        byte[] b=null;
        InputStream in=null;
        in=new URL("hdfs://localhost/words/file").openStream();
        in.read(b);
        System.out.println(""+b);
        for(int i=0;i<b.length;i++)
        {
            System.out.println("b[i]=%d"+b[i]);
            System.out.println(""+(char)b[i]);
        }
    }
}

Answer 1

您可以使用 HDFS API，這可以從本地運行。：

Configuration configuration = new Configuration();
        configuration.set("fs.defaultFS", "hdfs://namenode:8020");
        FileSystem fs = FileSystem.get(configuration);
Path filePath = new Path(
                "hdfs://namenode:8020/PATH");

        FSDataInputStream fsDataInputStream = fs.open(filePath);

Answer 2

首先，您需要在 URLs 對象中告訴 JVM 有關 HDFS 方案的信息。 這是通過以下方式完成的：

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

編譯 Java 類后，您需要使用hadoop命令：

hadoop filesystemhdfs

Hadoop 帶有一個方便的IOUtils 。 它會為你減輕很多事情。

Answer 3

您不能從 HDFS 讀取文件，因為 java 支持的常規文件系統。 為此，您需要使用HDFS java AP I。

public static void main(String a[]) {
     UserGroupInformation ugi
     = UserGroupInformation.createRemoteUser("root");

     try {


        ugi.doAs(new PrivilegedExceptionAction<Void>() {

            public Void run() throws Exception {

               Configuration conf = new Configuration();
                    //fs.default.name should match the corresponding value 
                    // in your core-site.xml in hadoop cluster
                conf.set("fs.default.name","hdfs://hostname:9000");
                conf.set("hadoop.job.ugi", "root");

                 readFile("words/file",conf) 

                return null;
            }
        });

    } catch (Exception e) {
        e.printStackTrace();
    }

}

 public static void readFile(String file,Configuration conf) throws IOException {
    FileSystem fileSystem = FileSystem.get(conf);

    Path path = new Path(file);
    if (!ifExists(path)) {
        System.out.println("File " + file + " does not exists");
        return;
    }

    FSDataInputStream in = fileSystem.open(path);

    BufferedReader br = new BufferedReader(new InputStreamReader(in));
    String line = null;
    while((line = br.readLine())!= null){
        System.out.println(line);
    }
    in.close();
    br.close();
    fileSystem.close();
 }
   public static boolean ifExists(Path source) throws IOException {

    FileSystem hdfs = FileSystem.get(conf);
    boolean isExists = hdfs.exists(source);
    System.out.println(isExists);
    return isExists;
 }

在這里，我正在遠程機器上嘗試，這就是為什么我使用UserGroupInformation並在PrivilegedExceptionAction的 run 方法中編寫代碼。 如果您在本地系統中，您可能不需要它。 哼！

Answer 4

回復有點晚，但它會幫助未來的讀者。 它將迭代您的 HDFS 目錄並讀取每個文件的內容。

僅使用 Hadoop 客戶端和 Java。

Configuration conf = new Configuration();
            conf.addResource(new Path(“/your/hadoop/conf/core-site.xml"));
            conf.addResource(new Path("/your/hadoop/confhdfs-site.xml"));
            FileSystem fs = FileSystem.get(conf);
            FileStatus[] status = fs.listStatus(new Path("hdfs://path/to/your/hdfs/directory”);
            for (int i = 0; i < status.length; i++) {
                FSDataInputStream inputStream = fs.open(status[i].getPath());
                String content = IOUtils.toString(inputStream, "UTF-8");
            }

使用 Java 以編程方式讀取存儲在 HDFS 中的文本文件的內容

問題描述

4 個解決方案

解決方案1
3 2014-02-25 16:47:59

解決方案2
1 2014-02-25 16:46:53

解決方案3
0 2014-02-26 12:56:56

解決方案4
0 2019-10-22 03:24:55

使用 Java 以編程方式讀取存儲在 HDFS 中的文本文件的內容

問題描述

4 個解決方案

解決方案1 3 2014-02-25 16:47:59

解決方案2 1 2014-02-25 16:46:53

解決方案3 0 2014-02-26 12:56:56

解決方案4 0 2019-10-22 03:24:55

解決方案1
3 2014-02-25 16:47:59

解決方案2
1 2014-02-25 16:46:53

解決方案3
0 2014-02-26 12:56:56

解決方案4
0 2019-10-22 03:24:55