当事件数等于或超过batchSize时，Flume-NG HDFS接收器如何写入文件？

Question

I am trying to configure Flume such that logs roll hourly or when they reach the default block size of HDFS (64 MB). 我正在尝试配置Flume，以便日志每小时或当它们达到HDFS（64 MB）的默认块大小时滚动。 Below is my current configuration: 以下是我目前的配置：

imp-agent.channels.imp-ch1.type = memory
imp-agent.channels.imp-ch1.capacity = 40000
imp-agent.channels.imp-ch1.transactionCapacity = 1000

imp-agent.sources.avro-imp-source1.channels = imp-ch1
imp-agent.sources.avro-imp-source1.type = avro
imp-agent.sources.avro-imp-source1.bind = 0.0.0.0
imp-agent.sources.avro-imp-source1.port = 41414

imp-agent.sources.avro-imp-source1.interceptors = host1 timestamp1
imp-agent.sources.avro-imp-source1.interceptors.host1.type = host
imp-agent.sources.avro-imp-source1.interceptors.host1.useIP = false
imp-agent.sources.avro-imp-source1.interceptors.timestamp1.type = timestamp

imp-agent.sinks.hdfs-imp-sink1.channel = imp-ch1
imp-agent.sinks.hdfs-imp-sink1.type = hdfs
imp-agent.sinks.hdfs-imp-sink1.hdfs.path = hdfs://mynamenode:8020/flume/impressions/yr=%Y/mo=%m/d=%d/logger=%{host}s1/
imp-agent.sinks.hdfs-imp-sink1.hdfs.filePrefix = Impr
imp-agent.sinks.hdfs-imp-sink1.hdfs.batchSize = 10
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollInterval = 3600
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollCount = 0
imp-agent.sinks.hdfs-imp-sink1.hdfs.rollSize = 66584576

imp-agent.channels = imp-ch1
imp-agent.sources = avro-imp-source1
imp-agent.sinks = hdfs-imp-sink1

My intention with the configuration above is to write to HDFS in batches of 10 and then roll the file being written to hourly. 我对上述配置的意图是以10个批次写入HDFS，然后将写入的文件滚动到每小时。 What I am seeing is that all of the data appears to be held in memory until since I am under 64MB until the files rolls after 1 hour. 我所看到的是，所有数据似乎都保存在内存中，直到我在64MB以下，直到文件在1小时后滚动。 Are there any settings I should be tweaking in order to get my desired behavior? 是否有任何设置我应该调整以获得我想要的行为？

Answer 1

To answer myself, Flume is writing the data to HDFS in batches. 为了回答自己，Flume正在批量将数据写入HDFS。 The file length is reported as open because a block is in process of being written to. 文件长度报告为打开，因为正在写入块。

当事件数等于或超过batchSize时，Flume-NG HDFS接收器如何写入文件？

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-11-27 19:10:05

当事件数等于或超过batchSize时，Flume-NG HDFS接收器如何写入文件？

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-11-27 19:10:05

解决方案1
0 已采纳 2013-11-27 19:10:05