简体   繁体   中英

Using static variable of another class in Hadoop Cluster

I've my hadoop program like below. I put snippets of relavent code. I pass the argument which reads BiG_DATA in main as true. In the main, "Working on Big data is printed". But When it comes to map method in RowPreMap class, the value of BIG_DATA is its initialize value of false. Not sure why this is happening. Am I missing something ? This works when I run this on a stand alone machine but not when I do this on a hadoop cluster. The jobs are handled by JobControl. Is it something with threads ?

public class UVDriver extends Configured implements Tool {

    public static class RowMPreMap extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, Text> {

        private Text keyText = new Text();
        private Text valText = new Text();

        public void map(LongWritable key, Text value,
                OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            // Input: (lineNo, lineContent)

            // Split each line using seperator based on the dataset.
            String line[] = null;
            if (Settings.BIG_DATA)
                line = value.toString().split("::");
            else
                line = value.toString().split("\\s");

            keyText.set(line[0]);
            valText.set(line[1] + "," + line[2]);

            // Output: (userid, "movieid,rating")
            output.collect(keyText, valText);

        }
    }

    public static class Settings {

        public static boolean BIG_DATA = false;

        public static int noOfUsers = 0;
        public static int noOfMovies = 0;

        public static final int noOfCommonFeatures = 10;
        public static final int noOfIterationsRequired = 3;
        public static final float INITIAL_VALUE = 0.1f;

        public static final String NORMALIZE_DATA_PATH_TEMP = "normalize_temp";
        public static final String NORMALIZE_DATA_PATH = "normalize";
        public static String INPUT_PATH = "input";
        public static String OUTPUT_PATH = "output";
        public static String TEMP_PATH = "temp";

    }

    public static class Constants {

        public static final int BIG_DATA_USERS = 71567;
        public static final int BIG_DATA_MOVIES = 10681;
        public static final int SMALL_DATA_USERS = 943;
        public static final int SMALL_DATA_MOVIES = 1682;

        public static final int M_Matrix = 1;
        public static final int U_Matrix = 2;
        public static final int V_Matrix = 3;
    }

    public int run(String[] args) throws Exception {

        // 1. Pre-process the data.
        // a) Normalize
        // 2. Initialize the U, V Matrices
        // a) Initialize U Matrix
        // b) Initialize V Matrix
        // 3. Iterate to update U and V.

        // Write Job details for each of the above steps.

        Settings.INPUT_PATH = args[0];
        Settings.OUTPUT_PATH = args[1];
        Settings.TEMP_PATH = args[2];
        Settings.BIG_DATA = Boolean.parseBoolean(args[3]);

        if (Settings.BIG_DATA) {
            System.out.println("Working on BIG DATA.");
            Settings.noOfUsers = Constants.BIG_DATA_USERS;
            Settings.noOfMovies = Constants.BIG_DATA_MOVIES;
        } else {
            System.out.println("Working on Small DATA.");
            Settings.noOfUsers = Constants.SMALL_DATA_USERS;
            Settings.noOfMovies = Constants.SMALL_DATA_MOVIES;
        }

            // some code here

            handleRun(control);


        return 0;
    }

    public static void main(String args[]) throws Exception {

        System.out.println("Program started");
        if (args.length != 4) {
            System.err
                    .println("Usage: UVDriver <input path> <output path> <fs path>");
            System.exit(-1);
        }

        Configuration configuration = new Configuration();
        String[] otherArgs = new GenericOptionsParser(configuration, args)
                .getRemainingArgs();
        ToolRunner.run(new UVDriver(), otherArgs);
        System.out.println("Program complete.");
        System.exit(0);
    }

}

Job control.

public static class JobRunner implements Runnable {
        private JobControl control;

        public JobRunner(JobControl _control) {
            this.control = _control;
        }

        public void run() {
            this.control.run();
        }
    }

    public static void handleRun(JobControl control)
            throws InterruptedException {
        JobRunner runner = new JobRunner(control);
        Thread t = new Thread(runner);
        t.start();

        int i = 0;
        while (!control.allFinished()) {
            if (i % 20 == 0) {
                System.out
                        .println(new Date().toString() + ": Still running...");
                System.out.println("Running jobs: "
                        + control.getRunningJobs().toString());
                System.out.println("Waiting jobs: "
                        + control.getWaitingJobs().toString());
                System.out.println("Successful jobs: "
                        + control.getSuccessfulJobs().toString());
            }
            Thread.sleep(1000);
            i++;
        }

        if (control.getFailedJobs() != null) {
            System.out.println("Failed jobs: "
                    + control.getFailedJobs().toString());
        }
    }

This won't work because the scope of the static modifier doesn't span across multiple instances of a JVM (much less a network.)

A map task always runs in a separate JVM, even if it is running local to the tool runner. The mapper class is instantiated using only the class name and has no access to the information you set in your tool runner.

This is one reason why the configuration framework exists.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM