简体   繁体   English

从Oozie Shell的jar运行写入文件

[英]Writing to file from jar run from Oozie shell

I have jar file that needs to be run before running our map reduce process. 我的jar文件需要先运行,然后再运行我们的map reduce过程。 This is going to process the data to be fed in later to the map reduce process. 这将处理数据,以便稍后将其输入到地图缩小过程中。 The jar file works fine without oozie , but I like to automate the workflow. jar文件在没有oozie情况下可以正常工作,但是我想自动化工作流程。

The jar if runs should accept two inputs: <input_file> and <output_dir> And it should be expected to output two files <output_file_1> , <output_file_2> under the <output_dir> specified. jar运行时应接受两个输入: <input_file><output_dir>并且应该预期在指定的<output_dir>下输出两个文件<output_file_1><output_file_2>

This is the workflow: 这是工作流程:

<workflow-app name="RI" xmlns="uri:oozie:workflow:0.4">
    <start to="RI"/>
    <action name="RI">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <exec>java </exec>
              <argument>-jar</argument>
              <argument>RI-Sequencer.jar </argument>
              <argument>log.csv</argument>
              <argument>/tmp</argument>
            <file>/user/root/algo/RI-Sequencer.jar#RI-Sequencer.jar</file>
            <file>/user/root/algo/log.csv#log.csv</file>
              <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

I run the task using Hue , and currently I can't get the output of the process to be written to files. 我使用Hue运行任务,当前无法获取要写入文件的流程输出。 It runs fine, but the supposed files are no where to be found. 它运行正常,但是在哪里找不到假定的文件。

I have also changed the output directory to be in HDFS, but with same result, no files are generated. 我也将输出目录更改为HDFS,但是结果相同,没有文件生成。

If it helps, this is sample of codes from my jar file: 如果有帮助,这是我jar文件中的代码示例:

File fileErr = new File(targetPath + "\\input_RI_err.txt");
fileErr.createNewFile();
textFileErr = new BufferedWriter(new FileWriter(fileErr));
// 
// fill in the buffer with the result
//
textFileErr.close();

UPDATE: If it helps, I can upload the jar file for testing. 更新:如果有帮助,我可以上传jar文件进行测试。

UPDATE 2: I've changed to make it write to HDFS. 更新2:我已经进行了更改,使其可以写入HDFS。 Still not working when using Oozie to execute the job. 使用Oozie执行作业时仍无法正常工作。 Running the job independently works. 独立运行该工作。

It seems like you are creating a regular output file (on the local filesystem, not HDFS). 似乎您正在创建常规输出文件(在本地文件系统上,而不是HDFS上)。 As the job is going to run on one of the node of the cluster, the output is going to be on the local /tmp of the machine picked. 由于作业将在集群的节点之一上运行,因此输出将在所选计算机的本地/ tmp上。

I do not understand why are you want to preprocess data before mapreduce. 我不明白为什么要在mapreduce之前预处理数据。 Think it is not too effective. 认为它不太有效。 But as Roamin said, you are saving your output file into local filesystem (file should be in your user home folder ~/). 但是正如Roamin所说,您正在将输出文件保存到本地文件系统中(文件应位于用户主文件夹〜/中)。 If you want to save your data into hdfs directly from java (without using mapreduce library) look here - How to write a file in HDFS using hadoop or Write a file in hdfs with java . 如果要直接从Java将数据保存到hdfs中(不使用mapreduce库),请查看此处- 如何使用hadoop在HDFS中写入文件在java中使用hdfs写入文件

Eventually you can generate your file to local directory and then load it into HDFS with this command: 最终,您可以将文件生成到本地目录,然后使用以下命令将其加载到HDFS中:

hdfs dfs -put <localsrc> ... <dst>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM