简体   繁体   English

如何在Java客户端中获取HDFS服务器元数据信息?

[英]How to get the HDFS server metadata information in a java client?

I need to build a utility class to test the connection to HDFS. 我需要构建一个实用程序类来测试与HDFS的连接。 The test should display the sever side version of HDFS and any other metadata. 测试应显示HDFS的服务器端版本和任何其他元数据。 Although, there are lot of client demos available but nothing on extracting the server metadata. 虽然,有很多客户端演示可用,但是提取服务器元数据方面没有任何帮助。 Could anybody help? 有人可以帮忙吗?

Please note that my client is a remote java client and dont have the hadoop and HDFS config files to initialise the configuration. 请注意,我的客户端是一个远程Java客户端,并且没有hadoop和HDFS配置文件来初始化配置。 I need to do it by connecting to the HDFS name node service using its URL on the fly. 我需要通过使用其URL动态连接到HDFS名称节点服务来做到这一点。

All hadoop nodes exposes the JMX interface and one of the features you can get via the JMX is the version. 所有hadoop节点都公开JMX接口,并且可以通过JMX获得的功能之一就是版本。 Good way to start is to run the Hadoop on your localhost and jconsole and connect to some node and explore the interface and copy&past the object names of MBeans. 一个好的开始方法是在本地主机和jconsole上运行Hadoop并连接到某个节点并浏览接口,然后复制并粘贴MBean的对象名称。 Unfortunately, there is nearly no documentation about Hadoop's JMX iface. 不幸的是,几乎没有关于Hadoop JMX iface的文档。

btw. 顺便说一句 NameNode provides the most useful information. NameNode提供最有用的信息。

Hadoop exposes some information over HTTP you can use. Hadoop通过HTTP公开了一些您可以使用的信息。 See Cloudera 's article. 请参阅Cloudera的文章。 Probably the easiest way would be to connect to the NN UI and parse the content returned by the server: 可能最简单的方法是连接到NN UI并解析服务器返回的内容:

URL url = new URL("http://myhost:50070/dfshealth.jsp");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
...

On the other hand if you know the address of the NN and JT you can connect to them with a simple client like this (Hadoop 0.20.0-r1056497): 另一方面,如果您知道NN和JT的地址,则可以使用这样的简单客户端(Hadoop 0.20.0-r10​​56497)连接到它们:

import java.net.InetSocketAddress;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hdfs.DFSClient;
import org.apache.hadoop.hdfs.protocol.ClientProtocol;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
import org.apache.hadoop.hdfs.protocol.FSConstants.DatanodeReportType;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.util.VersionInfo;

public class NNConnTest {

    private enum NNStats {

        STATS_CAPACITY_IDX(0, 
                "Total storage capacity of the system, in bytes: ");
        //... see org.apache.hadoop.hdfs.protocol.ClientProtocol 

        private int id;
        private String desc;

        private NNStats(int id, String desc) {
            this.id = id;
            this.desc = desc;
        }

        public String getDesc() {
            return desc;
        }

        public int getId() {
            return id;
        }

    }

    private enum ClusterStats {

        //see org.apache.hadoop.mapred.ClusterStatus API docs
        USED_MEM {
            @Override
            public String getDesc() {
                String desc = "Total heap memory used by the JobTracker: ";
                return desc + clusterStatus.getUsedMemory();
            }
        };

        private static ClusterStatus clusterStatus;
        public static void setClusterStatus(ClusterStatus stat) {
            clusterStatus = stat;
        }

        public abstract String getDesc();
    }


    public static void main(String[] args) throws Exception {

        InetSocketAddress namenodeAddr = new InetSocketAddress("myhost",8020);
        InetSocketAddress jobtrackerAddr = new InetSocketAddress("myhost",8021);

        Configuration conf = new Configuration();

        //query NameNode
        DFSClient client = new DFSClient(namenodeAddr, conf);
        ClientProtocol namenode = client.namenode;
        long[] stats = namenode.getStats();

        System.out.println("NameNode info: ");
        for (NNStats sf : NNStats.values()) {
            System.out.println(sf.getDesc() + stats[sf.getId()]);
        }

        //query JobTracker
        JobClient jobClient = new JobClient(jobtrackerAddr, conf); 
        ClusterStatus clusterStatus = jobClient.getClusterStatus(true);

        System.out.println("\nJobTracker info: ");
        System.out.println("State: " + 
                clusterStatus.getJobTrackerState().toString());

        ClusterStats.setClusterStatus(clusterStatus);
        for (ClusterStats cs : ClusterStats.values()) {
            System.out.println(cs.getDesc());
        }

        System.out.println("\nHadoop build version: " 
                + VersionInfo.getBuildVersion());

        //query Datanodes
        System.out.println("\nDataNode info: ");
        DatanodeInfo[] datanodeReport = namenode.getDatanodeReport(
                DatanodeReportType.ALL);
        for (DatanodeInfo di : datanodeReport) {
            System.out.println("Host: " + di.getHostName());
            System.out.println(di.getDatanodeReport());
        }

    }

}

Make sure that your client should use the same Hadoop version as your cluster does otherwise EOFException can occur. 确保您的客户端应使用same群集same Hadoop版本,否则可能发生EOFException

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM