简体   繁体   中英

NIFI: Ingesting monthly dump from SQL query into SFTP server as CSV files [closed]

I am in a situation where I would like to store data as respective monthly CSVs using SQL query into SFTP server.

For instance, my query is :

    select fooId, bar from FooBar 
where query_date>=20180101 and query_date<20180201 --(for the month of January 2018)

I would like to store it as 20180101_FooBar.csv on to my SFTP server. Similarly, other files for other months follow the same process with different query_date interval.

Important consideration to make : I have to store the *fooId* as MD5 Hash string.

How may I automate this flow in NIFI?

Roughly, the flow that I foresee is:

*ExecuteSQL*(but not sure how to paramterize the counter for query_date) 
-> *ConvertAvroToJson* 
-> *EvaluateJsonPath* (to extract the fooID ) 
-> *HashContent* 
-> *MergeContent* 
-> *PutSFTP*

Please advicee on how I may take this forward.

For this case I could think of three approaches.

Approach 1 : execute SQL query with MD5 function to get hash value of fooId:

在此处输入图片说明

Flow:

  1. GenerateFlowFile //add startdate,enddate attributes

     startdate -> ${now():format("yyyyMM"):minus(1):append("01")} enddate -> ${now():format("yyyyMM"):append("01")} 
  2. ExecuteSQL //select md5(fooId) foodId, bar from FooBar where

     query_date>=${startdate} and query_date<${enddate} 

    Change the above query as per your source to get md5 hash value for column

  3. ConvertRecord //convert Avro format to Json format

  4. UpdateAttribute //change the filename
  5. PutSFTP //store the file.

Approach 2 : Create MD5 hash value in NiFi

在此处输入图片说明

Flow:

  1. GenerateFlowFile //add startdate,enddate attributes

     startdate -> ${now():format("yyyyMM"):minus(1):append("01")} enddate -> ${now():format("yyyyMM"):append("01")} 
  2. ExecuteSQL //select fooId, bar from FooBar

    where query_date>=${startdate} and query_date

    change the above query as per your source to get md5 hash value for column

  3. ConvertRecord //convert Avro format to Json format

  4. SplitJson //split the array of json into individual flowfiles
  5. EvaluateJsonPath //extract all the key values as flowfile attributes except for fooId key.
  6. EvaluateJsonPath //overwrite the flowfile content with fooId value
  7. HashContent //get the hash value for the flowfile content with MD5 algorithm
  8. AttributesToJson //recreate the json message with new hash md5 value
  9. MergeContent //create json array with defragement strategy
  10. UpdateAttribute //change the filename
  11. PutSFTP ////store the file.

Another way is to write a script that can parse the json array messages and create md5 hashvalue for the fooId key and write the json message with the new md5 hashvalue.

I uploaded both approaches Approach1 and Approach2 templates, Save and Upload to NiFi instance for more reference and use the approach that best fits for your case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM