简体   繁体   中英

Formatting Apache Flume HDFS Serializer

I'm just getting started with flume and need to insert some headers into the hdfs sink.

I have this working although the format's wrong and I can't control the columns.

Using this configuration:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = syslogudp
a1.sources.r1.host = 0.0.0.0
a1.sources.r1.port = 44444

a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.HostInterceptor$Builder
a1.sources.r1.interceptors.i1.preserveExisting = false
a1.sources.r1.interceptors.i1.hostHeader = hostname

a1.sources.r1.interceptors.i2.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
a1.sources.r1.interceptors.i2.preserveExisting = false

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/user/vagrant/syslog/%y-%m-%d/
a1.sinks.k1.hdfs.rollInterval = 120
a1.sinks.k1.hdfs.rollCount = 100
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text

a1.sinks.k1.serializer = header_and_text
a1.sinks.k1.serializer.columns = timestamp hostname
a1.sinks.k1.serializer.format = CSV
a1.sinks.k1.serializer.appendNewline = true

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

The logs written to HDFS are mainly ok apart from the serialized aspects:

{timestamp=1415574695138, Severity=6, host=PolkaSpots, Facility=3, hostname=127.0.1.1} hostapd: wlan0-1: STA xx WPA: group key handshake completed (RSN)

How can I format the logs so they look like this:

1415574695138 127.0.1.1 hostapd: wlan0-1: STA xx WPA: group key handshake completed (RSN)

Timestamp first followed by hostname and then the syslog msg body.

The reason for this is that the two interceptors you've configured are writing the values to Flume event headers which get serialized to the body by the HeaderAndBodyTextEventSerializer. The latter just does this:

public void write(Event e) throws IOException {
    out.write((e.getHeaders() + " ").getBytes());
    out.write(e.getBody());
    if (appendNewline) {
      out.write('\n');
    }
  }

delegating to e.getHeaders() will only serialize the map to a JSON string.

To fix this problem, I would suggest to create your own serializer and overload the write() method to format your output to tab separated values. In that case, you would just need to specify the path to your class in:

a1.sinks.k1.serializer = com.mycompany.MySerlizer

and drop the jar in Flume's class path.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM