繁体   English   中英

从Kafka到HDFS的数据流期间Flume没有足够的空间错误

[英]Flume not enough space error while data flow from Kafka to HDFS

我们正在努力应对从卡夫卡到Flume管理的HDFS的数据流。 数据未完全传输到hdfs,原因如下所述。 但是此错误对我们来说似乎是误导性的,我们在数据目录和hdfs中都有足够的空间。 我们认为这可能是通道配置的问题,但是对于其他来源,我们也有类似的配置,并且可以正确地使用它们。 如果有人必须处理这个问题,我将不胜感激。

17 Aug 2017 14:15:24,335 ERROR [Log-BackgroundWorker-channel1] (org.apache.flume.channel.file.Log$BackgroundWorker.run:1204)  - Error doing checkpoint
java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288000 bytes
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1003)
        at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:986)
        at org.apache.flume.channel.file.Log.access$200(Log.java:75)
        at org.apache.flume.channel.file.Log$BackgroundWorker.run(Log.java:1201)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
17 Aug 2017 14:15:27,552 ERROR [PollableSourceRunner-KafkaSource-kafkaSource] (org.apache.flume.source.kafka.KafkaSource.doProcess:305)  - KafkaSource EXCEPTION, {}
org.apache.flume.ChannelException: Commit failed due to IO error [channel=channel1]
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:639)
        at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
        at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:194)
        at org.apache.flume.source.kafka.KafkaSource.doProcess(KafkaSource.java:286)
        at org.apache.flume.source.AbstractPollableSource.process(AbstractPollableSource.java:58)
        at org.apache.flume.source.PollableSourceRunner$PollingRunner.run(PollableSourceRunner.java:137)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Usable space exhausted, only 0 bytes remaining, required 524288026 bytes
        at org.apache.flume.channel.file.Log.rollback(Log.java:722)
        at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:637)
        ... 6 more

水槽配置:

agent2.sources = kafkaSource

#sources defined
agent2.sources.kafkaSource.type = org.apache.flume.source.kafka.KafkaSource
agent2.sources.kafkaSource.kafka.bootstrap.servers = …
agent2.sources.kafkaSource.kafka.topics = pega-campaign-response
agent2.sources.kafkaSource.channels = channel1

# channels defined
agent2.channels = channel1

agent2.channels.channel1.type = file
agent2.channels.channel1.checkpointDir = /data/cloudera/.flume/filechannel/checkpointdirs/pega
agent2.channels.channel1.dataDirs = /data/cloudera/.flume/filechannel/datadirs/pega
agent2.channels.channel1.capacity = 10000
agent2.channels.channel1.transactionCapacity = 10000

#hdfs sinks

agent2.sinks = sink

agent2.sinks.sink.type = hdfs
agent2.sinks.sink.hdfs.fileType = DataStream
agent2.sinks.sink.hdfs.path = hdfs://bigdata-cls:8020/stage/data/pega/campaign-response/%d%m%Y
agent2.sinks.sink.hdfs.batchSize = 1000
agent2.sinks.sink.hdfs.rollCount = 0
agent2.sinks.sink.hdfs.rollSize = 0
agent2.sinks.sink.hdfs.rollInterval = 120
agent2.sinks.sink.hdfs.useLocalTimeStamp = true
agent2.sinks.sink.hdfs.filePrefix = pega-

df -h命令:

Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   26G  6.8G   18G  28% /
devtmpfs               126G     0  126G   0% /dev
tmpfs                  126G  6.3M  126G   1% /dev/shm
tmpfs                  126G  2.9G  123G   3% /run
tmpfs                  126G     0  126G   0% /sys/fs/cgroup
/dev/sda1              477M  133M  315M  30% /boot
tmpfs                   26G     0   26G   0% /run/user/0
cm_processes           126G  1.9G  124G   2% /run/cloudera-scm-agent/process
/dev/scinib            2.0T   53G  1.9T   3% /data
tmpfs                   26G   20K   26G   1% /run/user/2000

将通道类型更改为memory-channel并对其进行测试以隔离磁盘空间问题。 agent2.channels.channel1.type =内存

另外,由于您已经在设置中添加了kafka,因此可以将其用作水槽通道。

https://flume.apache.org/FlumeUserGuide.html#kafka-channel

您的错误并不指向hdfs中的可用空间,而是您的通道中使用的文件在本地磁盘中的可用空间。 如果您在此处看到文件通道 ,则将看到默认值为524288000。检查可用本地空间是否足够(根据您的错误,该空间似乎为0)。 您也可以更改属性minimumRequiredSpace。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM