簡體   English   中英

Flume ng / Avro源,內存通道和HDFS接收器-小文件太多

[英]Flume ng / Avro source, memory channel and HDFS Sink - Too many small files

我面臨一個奇怪的問題。 我正在尋找從槽到HDFS的大量信息。 我采用了建議的配置以避免過多的小文件,但這沒有用。 這是我的配置文件。

# single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 5458
a1.sources.r1.threads = 20

# Describe the HDFS sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://myhost:myport/user/myuser/flume/events/%{senderType}/%{senderName}/%{senderEnv}/%y-%m-%d/%H%M
a1.sinks.k1.hdfs.filePrefix = logs-
a1.sinks.k1.hdfs.fileSuffix = .jsonlog
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 100
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#never roll-based on time
a1.sinks.k1.hdfs.rollInterval=0
##10MB=10485760, 128MB=134217728, 256MB=268435456
a1.sinks.kl.hdfs.rollSize=10485760
##never roll base on number of events
a1.sinks.kl.hdfs.rollCount=0
a1.sinks.kl.hdfs.round=false

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 5000
a1.channels.c1.transactionCapacity = 1000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

此配置有效,我看到了我的文件。 但是文件的平均重量為1.5kb。 Flume控制台輸出提供了此類信息。

16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating  hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484507.jsonlog
16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Renaming hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog.tmp to hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484508.jsonlog
16/08/03 09:48:31 INFO hdfs.BucketWriter: Creating hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp
16/08/03 09:48:31 INFO hdfs.BucketWriter: Closing hdfs://myhost:myport/user/myuser/flume/events/a/b/c/16-08-03/0948/logs-.1470210484509.jsonlog.tmp

有人對這個問題有想法嗎?


這是有關水槽行為的一些信息。

命令是flume-ng agent -n a1 -c / path / to / flume / conf --conf-file sample-flume.conf -Dflume.root.logger = TRACE,控制台-Xms8192m -Xmx16384m

注意 :logger指令無效。 我不明白為什么,但是我...

水槽的起始輸出為:

16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
16/08/03 15:32:55 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:sample-flume.conf
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:kl
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Creating channels
16/08/03 15:32:55 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
16/08/03 15:32:55 INFO node.AbstractConfigurationProvider: Created channel c1
16/08/03 15:32:55 INFO source.DefaultSourceFactory: Creating instance of source r1, type avro
16/08/03 15:32:55 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: hdfs
16/08/03 15:32:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
16/08/03 15:32:56 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]
16/08/03 15:32:56 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:Avro source r1: { bindAddress: 0.0.0.0, port: 5458 } }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@466ab18a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
16/08/03 15:32:56 INFO node.Application: Starting Channel c1
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
16/08/03 15:32:56 INFO node.Application: Starting Sink k1
16/08/03 15:32:56 INFO node.Application: Starting Source r1
16/08/03 15:32:56 INFO source.AvroSource: Starting Avro source r1: { bindAddress: 0.0.0.0, port: 5458 }...
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
16/08/03 15:32:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
16/08/03 15:32:56 INFO source.AvroSource: Avro source r1 started.

由於我無法獲得更詳細的輸出,因此我不得不假設

[...]
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
16/08/03 15:32:55 INFO conf.FlumeConfiguration: Processing:k1
[...]

表示接收器配置正確。


PS:我看到了以下答案,但沒有這些作品(我應該錯過一些東西……)。

flume-hdfs-sink在hdfs上生成很多小文件

hdfs匯水槽的文件太多

使用各種源和接收器的水槽分層數據流

flume-hdfs-sink保持滾動小文件

根據您的要求增加批量

a1.sinks.k1.hdfs.batchSize =

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM