简体   繁体   English

小文件存放在hdfs,归档在Nifi Flow

[英]Storing small files in hdfs and archiving them in Nifi Flow

I have an issue with small files and HDFS.我对小文件和 HDFS 有疑问。

Scenario: I am using NiFi to read messages from the Kafka topic, these are all really small.场景:我正在使用 NiFi 从 Kafka 主题中读取消息,这些消息都非常小。

Requirement: to store these raw messages of data in HDFS(for replay capability)...before doing further processing on them.要求:将这些原始数据消息存储在 HDFS 中(用于重放功能)......在对它们进行进一步处理之前。

I was thinking using Hadoop Archive (HAR) on them periodically.我正在考虑定期对它们使用 Hadoop 存档 (HAR)。 Is that something i can do through NiFi?这是我可以通过 NiFi 做的事情吗? the har command seems like a command line thing rather than something that i could execute through Nifi? har 命令似乎是命令行的东西,而不是我可以通过 Nifi 执行的东西? Would love to know a solution that can achieve my requirement, without bringing down HDFS due to the small files.很想知道一个可以满足我的要求的解决方案,而不会因为小文件而降低 HDFS。

Ginil吉尼尔

You can execute command line inside Nifi with ExecuteProcess processor:您可以使用 ExecuteProcess 处理器在 Nifi 中执行命令行:

http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.ExecuteProcess/ http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.ExecuteProcess/

You can also take a look at Kafka-connect HDFS for putting kafka records into HDFS.您还可以查看 Kafka-connect HDFS 将 kafka 记录放入 HDFS。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM