簡體   English   中英

Flume到HDFS將文件拆分為大量文件

[英]Flume to HDFS split a file to lots of files

我正在嘗試將700 MB的日志文件從flume傳輸到HDFS 我已經配置了flume劑如下:

...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0

源是一個spooldir ,通道是memory ,接收器是hdfs

我也嘗試發送一個1MB的文件,並將它分成1000個文件,每個文件大小為1KB。 我注意到的另一件事是轉移很慢,1MB大約需要1分鍾。 難道我做錯了什么?

您還需要禁用rolltimeout,這是通過以下設置完成的:

tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300

rollcount防止翻滾,rollIntervall此處設置為300秒,將其設置為0將禁用超時。 您將不得不選擇要轉換的機制,否則Flume將僅在關閉時關閉文件。

默認值如下:

hdfs.rollInterval   30  Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize   1024    File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount  10  Number of events written to file before it rolled (0 = never roll based on number of events)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM