简体   繁体   中英

NiFi: Reference FlowFile content in ExecuteSQL Processor

Is it possible to reference a FlowFile's content in a subsequent ExecuteSQL processor?

For example:

  • I'm using GenerateTableFetch and ExecuteSQL to poll a database table.
  • Next, I use QueryRecord to transform the result -- specifically, use MAX() and GROUP BY operations, because I can't use this operators with the GenerateTableFetch processor.

.

SELECT
    hu_id
    ,wh_id
    ,MAX(audit_timestamp) AS "newest_timestamp"
FROM FLOWFILE
GROUP BY
    hu_id
    ,wh_id
  • I would love to be able to then use another ExecuteSQL to do something like:

.

SELECT
    FLOWFILE.hu_id
    ,FLOWFILE.wh_id
    ,FLOWFILE.newest_timestamp
    ,hum.status
    ,hum.location_id
FROM FLOWFILE
INNER JOIN AAD.dbo.t_hu_master hum ON
    FLOWFILE.hu_id = hum.hu_id
    AND FLOWFILE.wh_id = hum.wh_id

... effectively referencing the Avro FlowFile content to perform a multi-join.

If this isn't possible, then is there an elegant workaround? So far, the only solution I can come up with is ...

  1. SplitAvro
  2. ConvertAvroToJSON
  3. EvaluateJSONPath
  4. ReplaceText (to create a bunch of individual SQL SELECT statements with the wh_id and hu_id ), and then ...
  5. ExecuteSQL

Any thoughts or insights are appreciated!

In the upcoming 1.10.0 release, you'll be able to do a lookup from a database using LookupRecord and the new DatabaseRecordLookupService (see NIFI-6082 ), this effectively does a join. In the meantime I think you'll need something like what you have, or a scripted processor (see ExecuteGroovyScript's Additional Details page) to do the lookup yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM