简体   繁体   中英

How can i execute a command only when there are files in hdfs

I'm working with NiFi and Hive in an HDP snadbox in Ambari.

I have a NiFi flow where i upload modified files to hdfs and then with a generateflowfile, i pass the query load data inpath 'hdfs/path/' into table tablename to a puthiveql processor.

It works great but i would like to do that ONLY when there are files in the path specified by 'hdfs/path' because when the command load inpath executes, that hdfs directory empties.

I don't know how can i do that.

Thank you so much!!

Use ListHDFS processor and configure the processor to run frequently(like every minute..etc), Directory property value.

  • This processor stores the state and incrementally runs and only outputs a flowfile when there are newly added files detected in the directory.

Then use ReplaceText Processor

  • Replacement strategy as AlwaysReplace
  • Replacement value as load data inpath '${path}/${filename}' into table tablename

Then connect the success relation to PutHiveQL processor to execute load data command.

Flow:

1.ListHDFS
2.ReplaceText
3.PutHiveQL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM