简体   繁体   English

当事件数等于或超过batchSize时,Flume-NG HDFS接收器如何写入文件?

[英]How come Flume-NG HDFS sink does not write to file when the number of events equals or exceeds the batchSize?

I am trying to configure Flume such that logs roll hourly or when they reach the default block size of HDFS (64 MB). 我正在尝试配置Flume,以便日志每小时或当它们达到HDFS(64 MB)的默认块大小时滚动。 Below is my current configuration: 以下是我目前的配置:

imp-agent.channels.imp-ch1.type = memory
imp-agent.channels.imp-ch1.capacity = 40000
imp-agent.channels.imp-ch1.transactionCapacity = 1000

imp-agent.sources.avro-imp-source1.channels = imp-ch1
imp-agent.sources.avro-imp-source1.type = avro
imp-agent.sources.avro-imp-source1.bind = 0.0.0.0
imp-agent.sources.avro-imp-source1.port = 41414

imp-agent.sources.avro-imp-source1.interceptors = host1 timestamp1
imp-agent.sources.avro-imp-source1.interceptors.host1.type = host
imp-agent.sources.avro-imp-source1.interceptors.host1.useIP = false
imp-agent.sources.avro-imp-source1.interceptors.timestamp1.type = timestamp

imp-agent.sinks.hdfs-imp-sink1.channel = imp-ch1
imp-agent.sinks.hdfs-imp-sink1.type = hdfs
imp-agent.sinks.hdfs-imp-sink1.hdfs.path = hdfs://mynamenode:8020/flume/impressions/yr=%Y/mo=%m/d=%d/logger=%{host}s1/
imp-agent.sinks.hdfs-imp-sink1.hdfs.filePrefix = Impr
imp-agent.sinks.hdfs-imp-sink1.hdfs.batchSize = 10
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollInterval = 3600
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollCount = 0
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollSize = 66584576

imp-agent.channels = imp-ch1
imp-agent.sources = avro-imp-source1
imp-agent.sinks = hdfs-imp-sink1

My intention with the configuration above is to write to HDFS in batches of 10 and then roll the file being written to hourly. 我对上述配置的意图是以10个批次写入HDFS,然后将写入的文件滚动到每小时。 What I am seeing is that all of the data appears to be held in memory until since I am under 64MB until the files rolls after 1 hour. 我所看到的是,所有数据似乎都保存在内存中,直到我在64MB以下,直到文件在1小时后滚动。 Are there any settings I should be tweaking in order to get my desired behavior? 是否有任何设置我应该调整以获得我想要的行为?

To answer myself, Flume is writing the data to HDFS in batches. 为了回答自己,Flume正在批量将数据写入HDFS。 The file length is reported as open because a block is in process of being written to. 文件长度报告为打开,因为正在写入块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM