繁体   English   中英

Flume到HDFS将文件拆分为大量文件

[英]Flume to HDFS split a file to lots of files

我正在尝试将700 MB的日志文件从flume传输到HDFS 我已经配置了flume剂如下:

...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0

源是一个spooldir ,通道是memory ,接收器是hdfs

我也尝试发送一个1MB的文件,并将它分成1000个文件,每个文件大小为1KB。 我注意到的另一件事是转移很慢,1MB大约需要1分钟。 难道我做错了什么?

您还需要禁用rolltimeout,这是通过以下设置完成的:

tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300

rollcount防止翻滚,rollIntervall此处设置为300秒,将其设置为0将禁用超时。 您将不得不选择要转换的机制,否则Flume将仅在关闭时关闭文件。

默认值如下:

hdfs.rollInterval   30  Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize   1024    File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount  10  Number of events written to file before it rolled (0 = never roll based on number of events)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM