简体   繁体   中英

Several Flume-ng hdfs sinks write to same path

I want to understand how flume-ng will handle such situation in terms of file name collisions. Asume I have several instances of equally configured flume agents and client uses them as load balancing group.

a1.sinks.k1.hdfs.path = /flume/events/path

How flume agents will generate filenames to make them unique across agents? Does it append agent name to it somehow(names looks like numbers so it is hard to figure this out)?

Flume does not solve this problem automatically. By default HDFS sink creates new file with name equal to current timestamp (in milliseconds), so collision may occur if two files are created at the same moment.

One way to fix it is manually set different file prefixes in different sinks:

a1.sinks.k1.hdfs.filePrefix = agentX

Also you can use event headers in prefix definition. For example, if you use host interceptor , which adds to events "host" header with value of agent's hostname, you can do something like this:

a1.sinks.k1.hdfs.filePrefix = ${host}

If you need to generate unique filenames completely automatically, you can develop your own interceptor, which will add UUID header to events. See examples here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM