简体   繁体   English

flume syslog代理未选择消息并将其放入HDFS

[英]flume syslog agent not picking the message and placing it into HDFS

I am trying to simulate syslog flume agent which eventually should put the data into HDFS. 我正在尝试模拟syslog flume代理,该代理最终应将数据放入HDFS。
My scenario follows: 我的情况如下:

the syslog flume agent is running on physical server A, following are the configuration details: syslog flume代理正在物理服务器A上运行,以下是配置详细信息:

=== ===

syslog_agent.sources = syslog_source
syslog_agent.channels = MemChannel
syslog_agent.sinks = HDFS

# Describing/Configuring the source
syslog_agent.sources.syslog_source.type = syslogudp
#syslog_agent.sources.syslog_source.bind = 0.0.0.0
syslog_agent.sources.syslog_source.bind = localhost
syslog_agent.sources.syslog_source.port = 514

# Describing/Configuring the sink
syslog_agent.sinks.HDFS.type=hdfs
syslog_agent.sinks.HDFS.hdfs.path=hdfs://<IP_ADD_OF_NN>:8020/user/ec2-user/syslog
syslog_agent.sinks.HDFS.hdfs.fileType=DataStream
syslog_agent.sinks.HDFS.hdfs.writeformat=Text
syslog_agent.sinks.HDFS.hdfs.batchSize=1000
syslog_agent.sinks.HDFS.hdfs.rollSize=0
syslog_agent.sinks.HDFS.hdfs.rollCount=10000
syslog_agent.sinks.HDFS.hdfs.rollInterval=600

# Describing/Configuring the channel
syslog_agent.channels.MemChannel.type=memory
syslog_agent.channels.MemChannel.capacity=10000
syslog_agent.channels.MemChannel.transactionCapacity=1000

#Bind sources and sinks to the channel
syslog_agent.sources.syslog_source.channels = MemChannel
syslog_agent.sinks.HDFS.channel = MemChannel

I am sending syslog "logs" from different physical server B using the inbuilt utility "logger", like this: 我正在使用内置实用程序“ logger”从其他物理服务器B发送syslog“ logs”,如下所示:

sudo logger --server < IP_Address_physical_server_A > --port 514 --udp sudo记录器--server < IP_Address_physical_server_A > --port 514 --udp

I do see yje log messages going into physical server-A 's path --> /var/log/messages 我确实看到yje日志消息进入物理服务器A的路径-> /var/log/messages

But I don't see any message going into HDFS; 但是我没有看到任何消息进入HDFS。 it seems the the flume agent isn't able to get any data, even though the messages are going from server-B to server-A. 即使消息从服务器B发送到服务器A,似乎水槽代理也无法获取任何数据。

Am I doing something wrong here? 我在这里做错什么了吗? Can anyone help me how to resolve this? 谁能帮我解决这个问题?

EDIT 编辑

The following is the output of netstat command on server-A where the syslog daemon is running: 以下是运行syslog守护程序的服务器-A上netstat命令的输出:

tcp        0      0 0.0.0.0:514             0.0.0.0:*               LISTEN      573/rsyslogd
tcp6       0      0 :::514                  :::*                    LISTEN      573/rsyslogd
udp        0      0 0.0.0.0:514             0.0.0.0:*                           573/rsyslogd
udp6       0      0 :::514                  :::*                                573/rsyslogd

I'm not sure what logger --server .gives you, but most examples I have seen use netcat. 我不确定是什么logger --server为您提供了帮助,但是我看到的大多数示例都使用netcat。

In any case, you've set batchSize=1000 , so until you send 1000 messages, Flume will not write to HDFS. 无论如何,您batchSize=1000batchSize=1000设置batchSize=1000 ,因此在发送1000条消息之前,Flume不会写入HDFS。

Keep in mind, HDFS is not a streaming platform, and prefers not to have small files. 请记住,HDFS不是流平台,并且不希望文件很小。

If you're looking for log collection, look into Elasticsearch or Solr fronted by a Kafka topic 如果您正在寻找日志收集,请查看Kafka主题前的Elasticsearch或Solr

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM