简体   繁体   中英

Access Oozie context from a java action

I have the following use case. In an oozie workflow, a map-reduce action generates a series of diagnostic counters. I want to have another java action following the map-reduce action. The java action basically does validation based on the counters from the map-reduce action and generate some notifications based on the validation conditions and results. The key thing for this idea to work is that the java action must be able to access all counters in the upstream map-reduce action, just like how oozie can use EL to access them in its workflow xml.

Right now I have no idea where to start for this. So, any pointer is very much appreciated.

update
For example, suppose I have a map-reduce action named foomr . In oozie workflow xml, you can use EL to access counters, eg, ${hadoop:counters("foomr")[RECORDS][MAP_IN]} . Then, my question would be, how can I get the same counter inside a java action? Does oozie expose any API to access values that are accessible to EL as in a workflow xml?

You can use capture output tag to capture the output of java action. These output in java properties format can be propogated in the oozie nodes.

The capture-output element can be used to propagate values back into Oozie context, which can then be accessed via EL-functions. This needs to be written out as a java properties format file. (From documentation page of oozie).

See below example to see how EL constants are used in a pig script. Refer to below HDFS EL constants which can be used.

Hadoop EL Constants

RECORDS: Hadoop record counters group name.
MAP_IN: Hadoop mapper input records counter name.
MAP_OUT: Hadoop mapper output records counter name.
REDUCE_IN: Hadoop reducer input records counter name.
REDUCE_OUT: Hadoop reducer input record counter name.
GROUPS: 1024 * Hadoop mapper/reducer record groups counter name.

Example showing use of EL constants user which is used to caluculate path dynamically. In similar way you can use above HDFS EL constants or user defined ones in workflow.

<pig>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/pig"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>id.pig</script>
            <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/text</param>
            <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/pig</param>
        </pig>

Edit :

You can also use the oozie java api which will give wf_actionData for a actionName.

org.apache.oozie.DagELFunctions.wf_actionData(String actionName).

Return the action data for an action.
Parameters: actionName action name.
Returns: value of the property.

I saw below line in oozie docs under the Parameterization of Workflows section:

EL expressions can be used in the configuration values of action and decision nodes. They can be used in XML attribute values and in XML element and attribute values. They cannot be used in XML element and attribute names. They cannot be used in the name of a node and they cannot be used within the transition elements of a node.

oozie docs

I think oozie is not exposing the workflow action data within the action nodes. we can pass it from outside as parameters to the java action.

If hadoop counters are to be accessed then I think you should check if YARN or jobtracker expose any web services API's where you can pass a jobname and get corresponding counters as output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM