简体   繁体   中英

MQTT topics and kafka topics mapping

I have started to learn about MQTT as I have a use case in telematics in my current organisation. I would like to integrate MQTT broker ( mosquitto ) messages to my kafka.

Since every vehicle is sending the data in its own topic in MQTT broker within a single organisation, I would like to push all this data in kafka. Now I know it is not advisable to create so many topics in kafka ( more than a million ). Also I would like not like to save all the vehicles data in one kafka topic as I would like to later put all this data in S3, differentiated via vehicle id.

How can I achieve this without making so many topics in kafka. One way is the consumer of kafka segregate the events and put in s3 but I believe there will be a lot of small files in S3.

Generally, if you have the same logical entity you would use the same topic.

You can use the MQTT plugin for Kafka Connect to stream the data from MQTT into Kafka, and Kafka Connect's Single Message Transform RegexRouter to modify the topic name to which messages are written, and other SMT to modify the message key. That way you get all the messages in one topic, partitioned based on the vehicle id. That's probably the best way to store it.

From there, you can use the data however you want. When it comes to stream it to S3, you can use Kafka Connect S3 sink and as cricket_007 mentioned partition the data by time if it's just volume you're worried about. If you want to route the messages to different buckets or areas of the same bucket you could use a stream processing (eg Kafka Streams / ksqlDB) to pre-process the topic to populate others.

See here for an example of the MQTT connector.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM