Cron作業用於在Linux中運行hadoop程序

Question

我是兩個新的Linux用戶，在我的項目中我們使用的是hadoop。現在我們已經編寫了3個mapreduce程序，將第一個程序的輸出輸入到第二個程序中，第二個程序的輸出輸入到第三個程序中。運行這3個不同的conf意味着首先我們要運行第一個程序的配置，然后運行第二個程序，然后運行第三個程序。現在，我們希望兩個運行一個完整的3個程序，另一個可以在Linux中使用cron job，如果是，請提及步驟。我們需要兩個使用cron的作業，因為我們需要重復運行兩個三個程序三個小時

Answer 1

1.使用&&創建shell腳本，以按順序執行hadoop程序。 執行第一個命令，然后使用&&然后第二個命令，依此類推。

例如： first command && second command && third command

2.在終端中輸入：

crontab -e

這將在終端中打開cronjob編輯器。

添加此行以每15分鍾運行一次shell腳本，

*/15 * * * * /path/to/your/shell/script

有關crontab的更多幫助，請參閱https://help.ubuntu.com/community/CronHowto

刪除/復印輸出目錄：

如果要避免目錄已存在錯誤，請在執行hadoop順序作業之前刪除或復制輸出目錄。 在hadoop作業命令之前將其添加到您的shell腳本中：

# Delete the output directory in HDFS
hadoop fs -rmr /your/hdfs/output/directory/to/be/deleted
# Copy the output directory from HDFS to HDFS
hadoop fs -mkdir /new/hdfs/location
hadoop fs -cp /your/hdfs/output/directory/to/be/copied/*.* /new/hdfs/location
# Copy from HDFS to local filesystem
sudo mkdir /path/to/local/filesystem
hadoop fs -copyToLocal /your/hdfs/output/directory/to/be/copied/*.* /path/to/local/filesystem

注：如果您使用的是最新的hadoop版本，請將hadoop fs替換為hdfs dfs ，將-rmr替換為-rm -r 。 復制目錄時不要忘記添加“ *。*” ，因為這將復制該目錄的所有內容。 根據您的配置更改HDFS文件路徑。

Answer 2

處理這種情況的最佳方法是使用鏈映射減少方法。

http://gandhigeet.blogspot.in/2012/12/as-discussed-in-previous-post-hadoop.html

我正在發布用於調用三個mapreduce作業的驅動程序代碼。

 public class ExerciseDriver {


static Configuration conf;

public static void main(String[] args) throws Exception {

    ExerciseDriver ED = new ExerciseDriver();
    conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);

    if(args.length < 2) {
        System.out.println("Too few arguments. Arguments should be:  <hdfs input folder> <hdfs output folder> ");
        System.exit(0);
    }

    String pathin1 = args[0];
    String pathout1 = args[1];


     //Run first Map reduce job
    fs.delete(new Path(pathout1+"_1"), true);

    ED.runFirstJob(pathin1, pathout1+"_1");

    ED.runSecondJob(pathout1+"_1", pathout1+"_2");

    ED.runThirdJob(pathout1+"_2", pathout1+"3");


}

  public int runFirstJob(String pathin, String pathout) throws Exception {

    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper1.class);
    job.setCombinerClass(ExerciseCombiner.class);
    job.setReducerClass(ExerciseReducer1.class);
    job.setInputFormatClass(ParagrapghInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));

   job.submit();  

   job.getMaxMapAttempts();


    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;

}

  public int runSecondJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper2.class);
    job.setReducerClass(ExerciseReducer2.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

   public int runThirdJob(String pathin, String pathout) throws Exception { 
    Job job = new Job(conf);
    job.setJarByClass(ExerciseDriver.class);
    job.setMapperClass(ExerciseMapper3.class);
    job.setReducerClass(ExerciseReducer3.class);
    job.setInputFormatClass(KeyValueTextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);    
    FileInputFormat.addInputPath(job,new Path(pathin));
    FileOutputFormat.setOutputPath(job, new Path(pathout));
    boolean success = job.waitForCompletion(true);
    return success ? 0 : -1;
}

  }

在計划crontab中的jar文件之后。 否則你也可以使用oozie。正如我們在驅動程序類中提到的那樣，3 mapreduce依次執行。第一個輸出用於第二個輸入

Cron作業用於在Linux中運行hadoop程序

問題描述

2 個解決方案

解決方案1
2 2015-04-14 17:18:02

解決方案2
0 2015-04-14 06:08:04

Cron作業用於在Linux中運行hadoop程序

問題描述

2 個解決方案

解決方案1 2 2015-04-14 17:18:02

解決方案2 0 2015-04-14 06:08:04

解決方案1
2 2015-04-14 17:18:02

解決方案2
0 2015-04-14 06:08:04