繁体   English   中英

在Hadoop集群中使用其他类的静态变量

[英]Using static variable of another class in Hadoop Cluster

我有如下的hadoop程序。 我放入了相关代码段。 我传递了将main中的BiG_DATA读为true的参数。 主要是“正在处理大数据”。 但是当谈到RowPreMap类中的map方法时,BIG_DATA的值是其初始化值false。 不知道为什么会这样。 我想念什么吗? 当我在独立的计算机上运行此代码时,此方法有效,但当我在hadoop群集上执行此操作时,则无效。 作业由JobControl处理。 有线程吗?

公共类UVDriver扩展了配置的工具Tool {

    public static class RowMPreMap extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, Text> {

        private Text keyText = new Text();
        private Text valText = new Text();

        public void map(LongWritable key, Text value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            // Input: (lineNo, lineContent)

            // Split each line using seperator based on the dataset.
            String line[] = null;
            if (Settings.BIG_DATA)
                line = value.toString().split("::");
            else
                line = value.toString().split("\\s");

            keyText.set(line[0]);
            valText.set(line[1] + "," + line[2]);

            // Output: (userid, "movieid,rating")
            output.collect(keyText, valText);

        }
    }

    public static class Settings {

        public static boolean BIG_DATA = false;

        public static int noOfUsers = 0;
        public static int noOfMovies = 0;

        public static final int noOfCommonFeatures = 10;
        public static final int noOfIterationsRequired = 3;
        public static final float INITIAL_VALUE = 0.1f;

        public static final String NORMALIZE_DATA_PATH_TEMP = "normalize_temp";
        public static final String NORMALIZE_DATA_PATH = "normalize";
        public static String INPUT_PATH = "input";
        public static String OUTPUT_PATH = "output";
        public static String TEMP_PATH = "temp";

    }

    public static class Constants {

        public static final int BIG_DATA_USERS = 71567;
        public static final int BIG_DATA_MOVIES = 10681;
        public static final int SMALL_DATA_USERS = 943;
        public static final int SMALL_DATA_MOVIES = 1682;

        public static final int M_Matrix = 1;
        public static final int U_Matrix = 2;
        public static final int V_Matrix = 3;
    }

    public int run(String[] args) throws Exception {

        // 1. Pre-process the data.
        // a) Normalize
        // 2. Initialize the U, V Matrices
        // a) Initialize U Matrix
        // b) Initialize V Matrix
        // 3. Iterate to update U and V.

        // Write Job details for each of the above steps.

        Settings.INPUT_PATH = args[0];
        Settings.OUTPUT_PATH = args[1];
        Settings.TEMP_PATH = args[2];
        Settings.BIG_DATA = Boolean.parseBoolean(args[3]);

        if (Settings.BIG_DATA) {
            System.out.println("Working on BIG DATA.");
            Settings.noOfUsers = Constants.BIG_DATA_USERS;
            Settings.noOfMovies = Constants.BIG_DATA_MOVIES;
        } else {
            System.out.println("Working on Small DATA.");
            Settings.noOfUsers = Constants.SMALL_DATA_USERS;
            Settings.noOfMovies = Constants.SMALL_DATA_MOVIES;
        }

            // some code here

            handleRun(control);


        return 0;
    }

    public static void main(String args[]) throws Exception {

        System.out.println("Program started");
        if (args.length != 4) {
            System.err
                    .println("Usage: UVDriver <input path> <output path> <fs path>");
            System.exit(-1);
        }

        Configuration configuration = new Configuration();
        String[] otherArgs = new GenericOptionsParser(configuration, args)
                .getRemainingArgs();
        ToolRunner.run(new UVDriver(), otherArgs);
        System.out.println("Program complete.");
        System.exit(0);
    }

}

作业控制。

public static class JobRunner implements Runnable {
        private JobControl control;

        public JobRunner(JobControl _control) {
            this.control = _control;
        }

        public void run() {
            this.control.run();
        }
    }

    public static void handleRun(JobControl control)
            throws InterruptedException {
        JobRunner runner = new JobRunner(control);
        Thread t = new Thread(runner);
        t.start();

        int i = 0;
        while (!control.allFinished()) {
            if (i % 20 == 0) {
                System.out
                        .println(new Date().toString() + ": Still running...");
                System.out.println("Running jobs: "
                        + control.getRunningJobs().toString());
                System.out.println("Waiting jobs: "
                        + control.getWaitingJobs().toString());
                System.out.println("Successful jobs: "
                        + control.getSuccessfulJobs().toString());
            }
            Thread.sleep(1000);
            i++;
        }

        if (control.getFailedJobs() != null) {
            System.out.println("Failed jobs: "
                    + control.getFailedJobs().toString());
        }
    }

这是行不通的,因为static修饰符的范围不会跨越JVM的多个实例(更不用说网络了)。

映射任务始终在单独的JVM中运行,即使它在工具运行程序本地运行也是如此。 映射器类仅使用类名实例化,而无权访问在工具运行器中设置的信息。

这是配置框架存在的原因之一。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM