簡體   English   中英

Flume HDFS接收器僅使用netcat源存儲一行數據源

[英]Flume HDFS sink only stores one line of data source using netcat source

我嘗試使用Flume 1.7將數據加載到HDFS中。 我創建了以下配置:

# Starting with: /opt/flume/bin/flume-ng agent -n Agent -c conf -f /opt/flume/conf/test.conf
# Naming the components on the current agent
Agent.sources = Netcat   
Agent.channels = MemChannel 
Agent.sinks = LoggerSink hdfs-sink LocalOut

# Describing/Configuring the source 
Agent.sources.Netcat.type = netcat 
Agent.sources.Netcat.bind = 0.0.0.0
Agent.sources.Netcat.port = 56565  

# Describing/Configuring the sink 
Agent.sinks.LoggerSink.type = logger  

# Define a sink that outputs to hdfs.
Agent.sinks.hdfs-sink.type = hdfs
Agent.sinks.hdfs-sink.hdfs.path = hdfs://<<IP of HDFS node>>:8020/user/admin/flume_folder/%y-%m-%d/%H%M/
Agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true
Agent.sinks.hdfs-sink.hdfs.fileType = DataStream
Agent.sinks.hdfs-sink.hdfs.writeFormat = Text
Agent.sinks.hdfs-sink.hdfs.batchSize = 100
Agent.sinks.hdfs-sink.hdfs.rollSize = 0
Agent.sinks.hdfs-sink.hdfs.rollCount = 0
Agent.sinks.hdfs-sink.hdfs.rollInterval = 0
Agent.sinks.hdfs-sink.hdfs.idleTimeout = 0

# Schreibt input into local Filesystem
#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
Agent.sinks.LocalOut.type = file_roll  
Agent.sinks.LocalOut.sink.directory = /tmp/flume
Agent.sinks.LocalOut.sink.rollInterval = 0  


# Describing/Configuring the channel 
Agent.channels.MemChannel.type = memory 
Agent.channels.MemChannel.capacity = 1000 
Agent.channels.MemChannel.transactionCapacity = 100 

# Bind the source and sink to the channel 
Agent.sources.Netcat.channels = MemChannel
Agent.sinks.LoggerSink.channel = MemChannel
Agent.sinks.hdfs-sink.channel = MemChannel
Agent.sinks.LocalOut.channel = MemChannel

之后,我使用netcat將以下文件發送到源:

cat textfile.csv | nc <IP of flume agent> 56565

該文件包含以下元素:

Name1,1
Name2,2
Name3,3
Name4,4
Name5,5
Name6,6
Name7,7
Name8,8
Name9,9
Name10,10
Name11,11
Name12,12
Name13,13
Name14,14
Name15,15
Name16,16
Name17,17
Name18,18
Name19,19
Name20,20
...
Name490,490
Name491,491
Name492,492

我面臨的問題是,沒有任何錯誤,水槽正在寫入hdfs,但是傳輸的文件只有一行。 如果您開始使用nectat將文件多次推送到源文件,則有時flume將多個文件寫入hdfs,包括多行。 但很少排成一行。

我試圖更改hdSize的rollSize,批處理大小和其他參數,但實際上並沒有改變行為。

接收器到本地文件也已配置工作正常。

有人知道如何配置它以確保所有條目都寫入hdfs而不丟失條目。

謝謝你的幫助。


更新1.12.2016

我刪除了除HDFS接收器之外的所有接收器,並更改了一些參數。 之后,HDFS接收器將按預期執行。

這里的配置:

# Naming the components on the current agent
Agent.sources = Netcat   
Agent.channels = MemChannel 
Agent.sinks = hdfs-sink 

# Describing/Configuring the source 
Agent.sources.Netcat.type = netcat 
Agent.sources.Netcat.bind = 0.0.0.0
Agent.sources.Netcat.port = 56565  


# Define a sink that outputs to hdfs.
Agent.sinks.hdfs-sink.type = hdfs
Agent.sinks.hdfs-sink.hdfs.path = hdfs://<<IP of HDFS node>>/user/admin/flume_folder/%y-%m-%d/%H%M/
Agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true
Agent.sinks.hdfs-sink.hdfs.fileType = DataStream
Agent.sinks.hdfs-sink.hdfs.writeFormat = Text
Agent.sinks.hdfs-sink.hdfs.batchSize = 100
Agent.sinks.hdfs-sink.hdfs.rollSize = 0
Agent.sinks.hdfs-sink.hdfs.rollCount = 100


# Describing/Configuring the channel 
Agent.channels.MemChannel.type = memory 
Agent.channels.MemChannel.capacity = 1000 
Agent.channels.MemChannel.transactionCapacity = 100 

# Bind the source and sink to the channel 
Agent.sources.Netcat.channels = MemChannel
Agent.sinks.hdfs-sink.channel = MemChannel

有人知道為什么它可以與此配置一起使用,但是具有兩個或多個接收器后,它不再起作用了嗎?

我自己找到了解決方案。 據我了解,我對兩個接收器使用了相同的通道。 因此,速度更快的接收器將接管所有條目,並且只有某些條目會傳遞到hdfs接收器。

使用不同的通道並包括使用參數將源散開后

Agent.sources.Netcat.selector.type = replicating

Flume將按預期方式寫入本地文件和hdfs。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM