简体   繁体   中英

ETL with Kafka and MongoDB as A Source

I am just learning about Apache Kafka. My current ETL is running on batch process and now I want it to run on stream process so that the data used for reporting is always up to date. As far as I understand I can use MongoDB connector to capture data change from mongodb then send it to a kafka topic. But in my ETL I need to store the data after being processed to an SQL database. How and where can I process the data sent from mongodb to a topic then create a record from it to another database? Can I use an AWS lambda function to do the processing and record creation? But then, how can I call this function from kafka?

The short answer to your question is Kafka Connect . The longer answer is Kafka Connect plus stream processing (such as Kafka Streams, ksqlDB, etc).

Your pipeline would look something like:

Here's a more general overview of using Kafka in ETL as both a blog and a talk . You can learn more about Kafka Connect in this talk .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM