简体   繁体   English

Oozie中的MapReduce作业可以从文件中读取吗?

[英]Can a MapReduce job in Oozie read from a file?

When creating a workflow in Oozie, I have a first java step that generates a file with the list of files I need for the next step (a map-reduce). 在Oozie中创建工作流程时,我首先执行了Java步骤,该步骤将生成一个文件,其中包含下一步所需的文件列表(map-reduce)。 How can I feed that map-reduce job with that file? 我如何用该文件来提供地图缩减工作?

I know that I could tick the Capture output box of the java step and then use mapred.input.dir in the map-reduce step to use that captured output as an input. 我知道我可以在java步骤的Capture输出框中打钩,然后在map-reduce步骤中使用mapred.input.dir将该捕获的输出用作输入。 But I want to detach myself from that. 但是我想让自己脱离这一点。

Just for the record, the content of my file looks like: 仅作记录,我文件的内容如下:

/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/18,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/19,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/20,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/21,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/22,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/23,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/24,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/25,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/26,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/27,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/28 /data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/18,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/19,/data/kafka /4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/20,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/21,/data/kafka/4/camus /DATA.TRADE.ORDERHISTORY/daily/2015/07/22,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/23,/data/kafka/4/camus/DATA.TRADE .ORDERHISTORY /每日/ 2015/07/24,/数据/卡夫卡/ 4 /加缪/ DATA.TRADE.ORDERHISTORY /每日/ 2015/07/25,/数据/卡夫卡/ 4 /加缪/ DATA.TRADE.ORDERHISTORY每日/ /2015/07/26,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07/27,/data/kafka/4/camus/DATA.TRADE.ORDERHISTORY/daily/2015/07 / 28

Do you want to use that file as an input file or a parameter file ?? 您要将该文件用作输入文件还是参数文件

In the second case, 在第二种情况下

  • activate <capture-output/> option for the initial Action 为初始操作激活<capture-output />选项
  • output something like "param.file=/a/b/c/z.txt" 输出类似“ param.file = / a / b / c / z.txt”的内容
  • in the next Action, use the appropriate EL function to retrieve the file name and pass it as a <property> or <env> 在下一个操作中,使用适当的EL函数来检索文件名,并将其作为<property><env>传递

    ${wf:actionData("InitialActionName")["param.file"]} $ {WF:actionData( “InitialActionName”)[ “param.file”]}

  • then use a few lines of Java to open that HDFS file and do whatever you are supposed to do with the content, before doing the actual Map or Reduce work 然后使用几行Java来打开该HDFS文件,并在执行实际的Map或Reduce工作之前执行对内容的任何处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM