Flume script gives Warning: No configuration directory set! Use --conf <dir> to override

Question

This is my configuration file below it was work before but suddenly later on giving error. Actully what I am trying to do is moving all the logs from local to hdfs the logs should moved as one file to hdfs not as a pieces :

#create source, channels, and sink

agent1.sources=S1
agent1.sinks=H1
agent1.channels=C1

#bind the source and sink to the channel

agent1.sources.S1.channels=C1
agent1.sinks.H1.channel=C1

#Specify the source type and directory
agent1.sources.S1.type=spooldir
agent1.sources.S1.spoolDir=/tmp/spooldir

#Specify the Sink type, directory, and parameters
agent1.sinks.H1.type=HDFS
agent1.sinks.H1.hdfs.path=/user/hive/warehouse
agent1.sinks.H1.hdfs.filePrefix=events
agent1.sinks.H1.hdfs.fileSuffix=.log
agent1.sinks.H1.hdfs.inUsePrefix=processing
A1.sinks.H1.hdfs.fileType=DataStream

#Specify the channeltyoe (Memory vs File)
agent1.channels.C1.type=file

I run my agent from this script:

flume-ng agent --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1

then I get this warning:

Warning: No configuration directory set! Use --conf <dir> to override.

also

16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration for 'A1' does not contain any channels. Marking it as invalid.
16/10/14 16:22:37 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'A1'. It will be removed.

then just Renaming, Creating and Closing same log to hdfs forever like this:

16/10/14 16:22:41 INFO node.Application: Starting Sink H1
16/10/14 16:22:41 INFO node.Application: Starting Source S1
16/10/14 16:22:41 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /tmp/spooldir
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: H1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: H1 started
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: S1: Successfully registered new MBean.
16/10/14 16:22:41 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: S1 started
16/10/14 16:22:41 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
16/10/14 16:22:42 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561961.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561961.log.tmp to /user/hive/warehouse/events.1476476561961.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561962.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561962.log.tmp to /user/hive/warehouse/events.1476476561962.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561963.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561963.log.tmp to /user/hive/warehouse/events.1476476561963.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561964.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Renaming /user/hive/warehouse/processingevents.1476476561964.log.tmp to /user/hive/warehouse/events.1476476561964.log
16/10/14 16:22:44 INFO hdfs.BucketWriter: Creating /user/hive/warehouse/processingevents.1476476561965.log.tmp
16/10/14 16:22:44 INFO hdfs.BucketWriter: Closing /user/hive/warehouse/processingevents.1476476561965.log.tmp
:
:
:

Why flume keeps writing same file for ever to hdfs, how can I move one log from local to hdfs without break them into parts because my log size usually between 50 kb to 300 kb.

Updates warning:

16/10/18 10:10:05 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)

16/10/18 10:10:05 WARN file.ReplayHandler: Ignoring /home/USER/.flume/file-channel/data/log-18 due to EOF
java.io.EOFException
    at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
    at org.apache.flume.channel.file.LogFileFactory.getSequentialReader(LogFileFactory.java:169)
    at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:264)
    at org.apache.flume.channel.file.Log.doReplay(Log.java:529)
    at org.apache.flume.channel.file.Log.replay(Log.java:455)
    at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:295)
    at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Answer 1

The conf folder is used by flume to pull JRE and logging properties from, you can fix the error message by using the --conf argument as noted:

flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/spoolingToHDFS.conf --name agent1

The warning about A1 is because you have a probably typo near the end of your agent configuration file:

A1.sinks.H1.hdfs.fileType=DataStream

which should read

agent1.sinks.H1.hdfs.fileType=DataStream

As for the files - you haven't configured a deserializer for the spoolDir source, and the default is LINE, so you're getting an HDFS file for each line in the files in your spoolDir. You want to use the BlobDeserializer if you want Flume to use the whole file as a single event ( https://flume.apache.org/FlumeUserGuide.html#blobdeserializer )

agent1.sources.S1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

Flume script gives Warning: No configuration directory set! Use --conf <dir> to override

Question

1 answers

solution1
0 2016-10-15 04:42:23

Flume script gives Warning: No configuration directory set! Use --conf <dir> to override

Question

1 answers

solution1 0 2016-10-15 04:42:23

solution1
0 2016-10-15 04:42:23