I'm working with NiFi and Hive in an HDP snadbox in Ambari.
I have a NiFi flow where i upload modified files to hdfs and then with a generateflowfile, i pass the query load data inpath 'hdfs/path/' into table tablename
to a puthiveql processor.
It works great but i would like to do that ONLY when there are files in the path specified by 'hdfs/path' because when the command load inpath
executes, that hdfs directory empties.
I don't know how can i do that.
Thank you so much!!
Use ListHDFS
processor and configure the processor to run frequently(like every minute..etc), Directory property value.
Then use ReplaceText
Processor
AlwaysReplace
load data inpath '${path}/${filename}' into table tablename
Then connect the success relation to PutHiveQL
processor to execute load data command.
Flow:
1.ListHDFS
2.ReplaceText
3.PutHiveQL
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.