简体   繁体   中英

How to transfer files using Kafka

My process creates a huge number of files time to time, I wanted to transfer files from my local directory to some location in HDFS, other than using NiFi, is it possible to develop that flow in java. If yes, please guide me in by giving some reference code in Java.

Please help me out!

You could do a couple of Things :-

1) Use Apache flume :- https://www.dezyre.com/hadoop-tutorial/flume-tutorial . This page says :- "Apache Flume is a distributed system used for aggregating the files to a single location. " This solution should be better than using kafka since it has been designed specifically for files.

2) Write Java code to ssh to your machine and scan for files that were modified after a specific timestamp. If you find such files open an input stream and save it on the machine your java code is running.

3) Alternatively your java code could be running on the machine your files are being created and you could scan for files created after specific timestamp and move them to any new machine

4) If you want to use only kafka. You could write a java code to read files find latest file/row and publish it to a kafka topic. Flume can do all this out of the box.

I don't know if there is a limit on the size of a message in Kafka, but you can use the ByteArraySerializer in the producer/consumer properties. Convert your file to bytes and then reconstruct it on the consumer.

Doing a quick search I found this

message.max.bytes (default:1000000) – Maximum size of a message the broker will accept. This has to be smaller than the consumer fetch.message.max.bytes, or the broker will have messages that can't be consumed, causing consumers to hang.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM