简体   繁体   中英

Reading file from a remote HDFS

I am trying to read a file from a remote HDFS system and display in the console of my local machine. Please note that the local machine can establish connection to any of the HDFS nodes only by the SSH key which is in the form of .pem file.

When I execute the below shown piece of code, the program runs, remains idle for some time and finally displays:

BlockMissingException : Could not obtain block

My code:

try {
            UserGroupInformation ugi = UserGroupInformation.createRemoteUser("remoteUser");

            ugi.doAs(new PrivilegedExceptionAction<Void>() {
                public Void run() throws Exception {
                    conf = new Configuration();
                    conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
                    conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
                    conf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
                    conf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
                    conf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
                    conf.set("fs.default.name", hdfsurl);
                    conf.set("fs.defaultFS", hdfsurl);
                    conf.set("hadoop.job.ugi", "remoteUser");
                    conf.set("hadoop.ssl.enabled", "false");
                    readFromHDFS(hdfsurl);
                    return null;
                }
            });
        } catch (Exception e) {



public static void readFromHDFS(String hdfsURL) throws Exception {
        FileSystem fileSystem = FileSystem.get(conf);
        Path path = new Path(hdfsURL);
        if (!fileSystem.exists(path)) {
            System.out.println("File does not exists");
            return;
        }
        FSDataInputStream in = fileSystem.open(path);
        Scanner sc = new Scanner(in);
        while (sc.hasNextLine()) {
            System.out.println("line read from hdfs...." + sc.nextLine());
        }
        in.close();
        fileSystem.close();
}

1) type hadoop fsck HDFS_FILE check if the particular hdfs file is healthy If not, then the particular file is corrupted. remove corrupted file and try below command

2) type hadoop dfsadmin -report check if the value of Missing blocks: 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM