简体   繁体   English

在Hadoop集群中使用其他类的静态变量

[英]Using static variable of another class in Hadoop Cluster

I've my hadoop program like below. 我有如下的hadoop程序。 I put snippets of relavent code. 我放入了相关代码段。 I pass the argument which reads BiG_DATA in main as true. 我传递了将main中的BiG_DATA读为true的参数。 In the main, "Working on Big data is printed". 主要是“正在处理大数据”。 But When it comes to map method in RowPreMap class, the value of BIG_DATA is its initialize value of false. 但是当谈到RowPreMap类中的map方法时,BIG_DATA的值是其初始化值false。 Not sure why this is happening. 不知道为什么会这样。 Am I missing something ? 我想念什么吗? This works when I run this on a stand alone machine but not when I do this on a hadoop cluster. 当我在独立的计算机上运行此代码时,此方法有效,但当我在hadoop群集上执行此操作时,则无效。 The jobs are handled by JobControl. 作业由JobControl处理。 Is it something with threads ? 有线程吗?

public class UVDriver extends Configured implements Tool { 公共类UVDriver扩展了配置的工具Tool {

    public static class RowMPreMap extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, Text> {

        private Text keyText = new Text();
        private Text valText = new Text();

        public void map(LongWritable key, Text value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            // Input: (lineNo, lineContent)

            // Split each line using seperator based on the dataset.
            String line[] = null;
            if (Settings.BIG_DATA)
                line = value.toString().split("::");
            else
                line = value.toString().split("\\s");

            keyText.set(line[0]);
            valText.set(line[1] + "," + line[2]);

            // Output: (userid, "movieid,rating")
            output.collect(keyText, valText);

        }
    }

    public static class Settings {

        public static boolean BIG_DATA = false;

        public static int noOfUsers = 0;
        public static int noOfMovies = 0;

        public static final int noOfCommonFeatures = 10;
        public static final int noOfIterationsRequired = 3;
        public static final float INITIAL_VALUE = 0.1f;

        public static final String NORMALIZE_DATA_PATH_TEMP = "normalize_temp";
        public static final String NORMALIZE_DATA_PATH = "normalize";
        public static String INPUT_PATH = "input";
        public static String OUTPUT_PATH = "output";
        public static String TEMP_PATH = "temp";

    }

    public static class Constants {

        public static final int BIG_DATA_USERS = 71567;
        public static final int BIG_DATA_MOVIES = 10681;
        public static final int SMALL_DATA_USERS = 943;
        public static final int SMALL_DATA_MOVIES = 1682;

        public static final int M_Matrix = 1;
        public static final int U_Matrix = 2;
        public static final int V_Matrix = 3;
    }

    public int run(String[] args) throws Exception {

        // 1. Pre-process the data.
        // a) Normalize
        // 2. Initialize the U, V Matrices
        // a) Initialize U Matrix
        // b) Initialize V Matrix
        // 3. Iterate to update U and V.

        // Write Job details for each of the above steps.

        Settings.INPUT_PATH = args[0];
        Settings.OUTPUT_PATH = args[1];
        Settings.TEMP_PATH = args[2];
        Settings.BIG_DATA = Boolean.parseBoolean(args[3]);

        if (Settings.BIG_DATA) {
            System.out.println("Working on BIG DATA.");
            Settings.noOfUsers = Constants.BIG_DATA_USERS;
            Settings.noOfMovies = Constants.BIG_DATA_MOVIES;
        } else {
            System.out.println("Working on Small DATA.");
            Settings.noOfUsers = Constants.SMALL_DATA_USERS;
            Settings.noOfMovies = Constants.SMALL_DATA_MOVIES;
        }

            // some code here

            handleRun(control);


        return 0;
    }

    public static void main(String args[]) throws Exception {

        System.out.println("Program started");
        if (args.length != 4) {
            System.err
                    .println("Usage: UVDriver <input path> <output path> <fs path>");
            System.exit(-1);
        }

        Configuration configuration = new Configuration();
        String[] otherArgs = new GenericOptionsParser(configuration, args)
                .getRemainingArgs();
        ToolRunner.run(new UVDriver(), otherArgs);
        System.out.println("Program complete.");
        System.exit(0);
    }

}

Job control. 作业控制。

public static class JobRunner implements Runnable {
        private JobControl control;

        public JobRunner(JobControl _control) {
            this.control = _control;
        }

        public void run() {
            this.control.run();
        }
    }

    public static void handleRun(JobControl control)
            throws InterruptedException {
        JobRunner runner = new JobRunner(control);
        Thread t = new Thread(runner);
        t.start();

        int i = 0;
        while (!control.allFinished()) {
            if (i % 20 == 0) {
                System.out
                        .println(new Date().toString() + ": Still running...");
                System.out.println("Running jobs: "
                        + control.getRunningJobs().toString());
                System.out.println("Waiting jobs: "
                        + control.getWaitingJobs().toString());
                System.out.println("Successful jobs: "
                        + control.getSuccessfulJobs().toString());
            }
            Thread.sleep(1000);
            i++;
        }

        if (control.getFailedJobs() != null) {
            System.out.println("Failed jobs: "
                    + control.getFailedJobs().toString());
        }
    }

This won't work because the scope of the static modifier doesn't span across multiple instances of a JVM (much less a network.) 这是行不通的,因为static修饰符的范围不会跨越JVM的多个实例(更不用说网络了)。

A map task always runs in a separate JVM, even if it is running local to the tool runner. 映射任务始终在单独的JVM中运行,即使它在工具运行程序本地运行也是如此。 The mapper class is instantiated using only the class name and has no access to the information you set in your tool runner. 映射器类仅使用类名实例化,而无权访问在工具运行器中设置的信息。

This is one reason why the configuration framework exists. 这是配置框架存在的原因之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM