簡體   English   中英

將數據寫入水槽,然后寫入HDFS

[英]Writing data into flume and then to HDFS

我正在使用水槽1.5.0.1和hadoop 2.4.1,嘗試將字符串放入水槽並保存到HDFS。 Flume配置文件如下:

    agentMe.channels = memory-channel
agentMe.sources = my-source AvroSource
agentMe.sinks = log-sink hdfs-sink

agentMe.sources.AvroSource.channels = memory-channel
agentMe.sources.AvroSource.type = avro
agentMe.sources.AvroSource.bind = 0.0.0.0 # i tried client ip as well
agentMe.sources.AvroSource.port = 41414

agentMe.channels.memory-channel.type = memory
agentMe.channels.memory-channel.capacity = 1000
agentMe.channels.memory-channel.transactionCapacity = 100

agentMe.sources.my-source.type = netcat
agentMe.sources.my-source.bind = 127.0.0.1 #If i use any other IP like the client from where the string is going to come from then i get unable to bind exception.
agentMe.sources.my-source.port = 9876
agentMe.sources.my-source.channels = memory-channel


# Define a sink that outputs to hdfs.
agentMe.sinks.hdfs-sink.channel = memory-channel
agentMe.sinks.hdfs-sink.type = hdfs
agentMe.sinks.hdfs-sink.hdfs.path = hdfs://localhost:54310/user/netlog/flume.txt
agentMe.sinks.hdfs-sink.hdfs.fileType = DataStream
agentMe.sinks.hdfs-sink.hdfs.batchSize = 2
agentMe.sinks.hdfs-sink.hdfs.rollCount = 0
agentMe.sinks.hdfs-sink.hdfs.inUsePrefix = tcptest-
agentMe.sinks.hdfs-sink.hdfs.inUseSuffix = .txt
agentMe.sinks.hdfs-sink.hdfs.rollSize = 0
agentMe.sinks.hdfs-sink.hdfs.rollInterval = 3
agentMe.sinks.hdfs-sink.hdfs.writeFormat = Text
agentMe.sinks.hdfs-sink.hdfs.path = /user/name/%y-%m-%d/%H%M/%S

我已經在這里提出了同樣的問題

client.sendDataToFlume("hello world")

我看到NettyAvroRpcClient無法連接到運行水槽的服務器。 但是我只是發送一個簡單的字符串,我錯過了什么。

專家建議

如我所見,如果要與NettyAvroRpcClient連接,實際上需要設置Avro RPC源。 示例配置如下:

# Define an Avro source called AvroSource on SpoolAgent and tell it
# to bind to 0.0.0.0:41414. Connect it to channel MemChannel.
agentMe.sources.AvroSource.channels = MemChannel
agentMe.sources.AvroSource.type = avro
agentMe.sources.AvroSource.bind = 0.0.0.0
agentMe.sources.AvroSource.port = 41414

這將在端口41414上創建一個AvroRPC源。

配置必須正確,否則可能無法解決問題。 因此,這里是將數據讀入水槽,然后讀入HDFS的配置。

a1.sources = r1
a1.sinks =  k2
a1.channels = c1

a1.channels.c1.type = memory

a1.sources.r1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 41414
a1.sources.r1.interceptors = a
a1.sources.r1.interceptors.a.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c1
a1.sinks.k2.hdfs.fileType = DataStream
a1.sinks.k2.hdfs.batchSize = 10
a1.sinks.k2.hdfs.rollCount = 10
a1.sinks.k2.hdfs.rollSize = 10
a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.writeFormat = Text
a1.sinks.k2.hdfs.path = /user/flume/%y-%m-%d/%H%M/

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k2.channel = c1

這應該有幫助:)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM