簡體   English   中英

Flume HDFS Sink無法從Kafka通道在hdfs中創建文件

[英]Flume HDFS sink is not creating files in hdfs from Kafka channel

我正在嘗試實現一個簡單的Flume HDFS接收器,該接收器將從Kafka通道獲取事件,並將其作為文本文件寫入hdfs。

該體系結構非常簡單。 這些事件從Twitter流式傳輸並發送到kafka主題,並且水槽hdfs接收器確實會將這些事件寫入hdfs。 這是Kafka-producer stackoverflow問題的第二部分

我執行此命令時沒有錯誤,並且看起來工作正常,但是我看不到hdfs中的文本文件。 我無法調試或調查,因為/var/log/flume/文件夾中沒有創建/var/log/flume/文件。 我正在使用Hortonworks沙盒2.3.1,並使用色相瀏覽文件系統。


執行水槽的命令: flume-ng agent -n KafkaSink -c conf -f tweets_sink_flume.properties

Flume屬性文件: tweets_sink_flume.properties

# agent name, in this case KafkaSink.
KafkaSink.sources  = kafka-source-1
KafkaSink.channels = hdfs-channel-1
KafkaSink.sinks    = hdfs-sink-1

# kafka source properties.
KafkaSink.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
KafkaSink.sources.kafka-source-1.zookeeperConnect = sandbox.hortonworks.com:2181
KafkaSink.sources.kafka-source-1.topic = raw_json_tweets
KafkaSink.sources.kafka-source-1.batchSize = 100
KafkaSink.sources.kafka-source-1.channels = hdfs-channel-1

# hdfs flume sink properties
KafkaSink.channels.hdfs-channel-1.type   = memory
KafkaSink.sinks.hdfs-sink-1.channel = hdfs-channel-1
KafkaSink.sinks.hdfs-sink-1.type = hdfs
KafkaSink.sinks.hdfs-sink-1.hdfs.writeFormat = Text
KafkaSink.sinks.hdfs-sink-1.hdfs.fileType = DataStream
KafkaSink.sinks.hdfs-sink-1.hdfs.filePrefix = test-events
KafkaSink.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true

# created /user/ruben/flume/ folder in hdfs to avoid permission error issues
KafkaSink.sinks.hdfs-sink-1.hdfs.path = /user/ruben/flume/%{topic}/%y-%m-%d
KafkaSink.sinks.hdfs-sink-1.hdfs.rollCount=100
KafkaSink.sinks.hdfs-sink-1.hdfs.rollSize=0

# specify the capacity of the memory channel.
KafkaSink.channels.hdfs-channel-1.capacity = 10000
KafkaSink.channels.hdfs-channel-1.transactionCapacity = 1000

這是相關的Flume控制台輸出的一部分:

..
16/08/26 22:20:48 INFO kafka.KafkaSource: Kafka source kafka-source-1 started.
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], begin rebalancing consumer flume_sandbox.hortonworks.com-1472250048016-72a02e7f try #0
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager-1472250048139] Stopping leader finder thread
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f-leader-finder-thread], Shutting down
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f-leader-finder-thread], Stopped
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f-leader-finder-thread], Shutdown completed
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager-1472250048139] Stopping all fetchers
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager-1472250048139] All connections stopped
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], Cleared all relevant queues for this fetcher
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], Cleared the data chunks in all the consumer message iterators
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], Committing all offsets after clearing the fetcher queues
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], Releasing partition ownership
16/08/26 22:36:38 INFO consumer.RangeAssignor: Consumer flume_sandbox.hortonworks.com-1472250048016-72a02e7f rebalancing the following partitions: ArrayBuffer(0) for topic raw_json_tweets with consumers: List(flume_sandbox.hortonworks.com-1472250048016-72a02e7f-0)
16/08/26 22:36:38 INFO consumer.RangeAssignor: flume_sandbox.hortonworks.com-1472250048016-72a02e7f-0 attempting to claim partition 0
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], flume_sandbox.hortonworks.com-1472250048016-72a02e7f-0 successfully owned partition 0 for topic raw_json_tweets
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], Consumer flume_sandbox.hortonworks.com-1472250048016-72a02e7f selected partitions : raw_json_tweets:0: fetched offset = -1: consumed offset = -1
16/08/26 22:36:38 INFO consumer.ZookeeperConsumerConnector: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f], end rebalancing consumer flume_sandbox.hortonworks.com-1472250048016-72a02e7f try #0
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager$LeaderFinderThread: [flume_sandbox.hortonworks.com-1472250048016-72a02e7f-leader-finder-thread], Starting
16/08/26 22:36:38 INFO utils.VerifiableProperties: Verifying properties
16/08/26 22:36:38 INFO utils.VerifiableProperties: Property client.id is overridden to flume
16/08/26 22:36:38 INFO utils.VerifiableProperties: Property metadata.broker.list is overridden to sandbox.hortonworks.com:6667
16/08/26 22:36:38 INFO utils.VerifiableProperties: Property request.timeout.ms is overridden to 30000
16/08/26 22:36:38 INFO client.ClientUtils$: Fetching metadata from broker id:0,host:sandbox.hortonworks.com,port:6667 with correlation id 0 for 1 topic(s) Set(raw_json_tweets)
16/08/26 22:36:38 INFO producer.SyncProducer: Connected to sandbox.hortonworks.com:6667 for producing
16/08/26 22:36:38 INFO producer.SyncProducer: Disconnecting from sandbox.hortonworks.com:6667
16/08/26 22:36:38 INFO consumer.ConsumerFetcherThread: [ConsumerFetcherThread-flume_sandbox.hortonworks.com-1472250048016-72a02e7f-0-0], Starting
16/08/26 22:36:38 INFO consumer.ConsumerFetcherManager: [ConsumerFetcherManager-1472250048139] Added fetcher for partitions ArrayBuffer([[raw_json_tweets,0], initOffset -1 to broker id:0,host:sandbox.hortonworks.com,port:6667] )

默認情況下,Flume的Kafka來源僅會獲得在啟動帶有該來源的代理后寫入Kafka的消息-它不會從一開始就消耗該主題。

所以:

  1. 確保在Flume啟動后繼續寫該主題
  2. 要求Kafka源代碼從一開始就使用readSmallestOffset = true配置開始。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM